Statistics Seminar

Su Mo Tu We Th Fr Sa
28 29 30 1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31 1
Date/Time:Monday, 13 Oct 2014 from 4:10 pm to 5:00 pm
Location:Snedecor 3105
Cost:Free
URL:www.stat.iastate.edu
Contact:Jeanette La Grange
Phone:515-294-3440
Channel:College of Liberal Arts and Sciences
Categories:Lectures
Actions:Download iCal/vCal | Email Reminder
"Probabilistic Error-correction using Markov Inference in Error Reads", Karin Dorman, Department of Statistics, Iowa State University, Ames

Next generation sequencing (NGS) is a technology revolutionizing genetics and biology. Compared with the old Sanger sequencing method, the throughput is astounding and has fostered a slew of innovative sequencing applications. Unfortunately, the error rates are also higher, complicating many downstream analyses. For example, de novo assembly of genomes is less accurate and slower when reads include many errors. We develop a probabilistic model for NGS reads that can detect and correct errors without a reference genome and while flexibly modeling and estimating the error properties of the sequencing machine.

It uses a penalized likelihood to enforce our prior belief that the kmer spectrum (collection of k-length strings observed in the reads) generated from a genome is sparse when k is sufficiently large. The model formalizes core ideas that are used in many ad hoc algorithmic approaches to error correction. We show our method can detect and remove more errors from sequencing reads than existing methods. Though our method carries a higher computational burden than the best algorithmic approaches, the probabilistic approach is extensible, flexible, and well-positioned to support downstream statistical analysis of the increasing volume of sequence data.