- This event has passed.
Matrix Seminar: Yun William Yu on Augmenting k-mer sketching for (meta)genomic sequence comparisons
November 28 @ 1:30 pm - 3:00 pm
FreeTitle: Augmenting k-mer sketching for (meta)genomic sequence comparisons
Abstract: Over the last decade, k-mer sketching (e.g. minimizers or MinHash) to create succinct summaries of long sequences has proven effective at improving the speed of sequence comparisons. However, rigorously characterizing the accuracy of these techniques has been more difficult. In this talk, I’ll touch on three results that showcase some of the modern theoretical developments and practical applications of theory to building faster sequence comparison tools for metagenomics.
We begin by rigorously providing average-case guarantees for the popular seed-chain-extend heuristic for pairwise sequence alignment under a random substitution model, showing that it is accurate and runs in close to O(n log n) time for similar sequences. Then, we will turn our focus to metagenomics: our new tool skani computes average nucleotide identity (ANI) using sparse approximate alignments, and is both more accurate and over 20 times faster than the current state-of-the-art FastANI for comparing incomplete, fragmented MAGs (metagenome assembled genomes). This was enabled by Belbasi, et al.’s work showing that minimizers are biased Jaccard estimators, whereas other k-mer sketching does not have that drawback. Finally, we will introduce sylph, which enables fast and accurate database search to find nearest neighbor genomes (in ANI space) of low-coverage sequenced samples by using a combination of k-mer sketching with a zero-inflated Poisson correction (45x faster than MetaPhlAn for screening databases).
All of the work in this talk is joint with my brilliant former student Jim Shaw.
=================
Bio: William is a computational biologist and applied mathematician interested in compression, genomics, privacy, and sketching. He is currently an assistant professor in the Ray and Stephanie Lane Computational Biology Department at Carnegie Mellon University. Prior to moving to CMU, he was an assistant professor in the math department at the University of Toronto. He was a graduate student with Bonnie Berger in Mathematics at MIT, and a postdoctoral fellow with Griffin Weber in Biomedical Informatics at Harvard Medical School. William is originally from Huntingburg, Indiana, USA.
=================