Carnegie Mellon University

Wendy Yang thesis defense details

June 17, 2025

Thesis Defense: Wendy Yang | June 26, 2025 | 1pm

CBD and CPCB are proud to announce the following thesis defense:

Title: Harnessing Machine Learning to Decode Human Genome Sequence-Structure-Function Relationships
Wendy Yang

GHC 7501
1:00PM EST

Committee:

Jian Ma (Chair, Carnegie Mellon University) 
Russell Schwartz (Carnegie Mellon University) 
Harinder Singh (University of Pittsburgh) 
Christina Leslie (Memorial Sloan Kettering Cancer Center)

Abstract:
The human genome is intricately organized within the nucleus, where the spatial positioning and hierarchical structure of chromatin play essential roles in regulating genome function. While it is known that genome structure and function is regulated by DNA sequence and epigenomic features, the principles underlying the sequence–structure–function relationships remain poorly understood. In this dissertation, I develop machine learning frameworks to decode how DNA sequence and 3D nuclear organization shape genome function across diverse cell types and species. First, I introduce UNADON, a transformer-based deep learning model that predicts chromatin spatial positioning relative to nuclear bodies using DNA sequence and epigenomic data. UNADON achieves high accuracy in both within-cell-type and cross-cell-type predictions, and identifies key sequence and epigenomic determinants of chromatin positioning. Second, I present TEMPURA, a graph neural network model that integrates 1D DNA sequence features and 3D chromatin interactions to predict structural and functional genomic signals across human cell types and non-human primates. TEMPURA achieves robust and generalizable cross-cell-type and cross-species prediction performance, highlighting its utility for inferring genome organization in unseen cell types and species. Next, leveraging newly assembled telomere-to-telomere (T2T) genomes in primates, I conduct a cross-species analysis of replication timing to identify conserved and lineage-specific patterns of genome organization, including those within previously inaccessible genomic regions. Finally, I develop CHANGE-net, a convolutional neural network model for predicting CRISPR-Cas9 off-target activity. CHANGE-net accurately predicts the experimentally validated effect of human genetic variations on off-target activity, and generalizes well across gRNA and technologies. Together, these contributions advance our understanding of how genome sequence encodes higher-order structure and function, and establish robust computational frameworks for predictive modeling and interpretation in computational genomics.