DipGNNome: Diploid de Novo Genome Assembly with Geometric Deep Learning and Beam-Search
Abstract
De novo genome assembly remains a central challenge in computational biology, particularly for diploid genomes where maternal and paternal haplotypes must be accurately resolved. Existing assemblers achieve impressive results through carefully designed heuristics, yet modern deep learning methods remain largely unexplored in the diploid setting. We present DipGNNome, the first deep learning–based framework for diploid de novo genome assembly. Our approach formulates genome assembly as an edge classification and graph traversal problem, given haplotype-aware assembly graphs. We train a graph neural network (GNN) to guide contig construction as the layout phase in an Overlap-Layout-Consensus genome assembly pipeline. To enable this, we establish a novel pipeline for generating diploid graphs with ground-truth edge labels, providing the first systematic way to produce training data for machine learning models in this domain. This framework creates a foundation for applying and extending graph-based deep learning to diploid assembly. DipGNNome creates assemblies comparable to SotA and demonstrates the feasibility of deep learning for diploid assembly and introduces a paradigm that bridges algorithmic genomics with graph representation learning.
Type
Publication
In RECOMB International Workshop on Comparative Genomics