Nicholas Lytal Abstracts

Nicholas Lytal Abstracts

Nicholas Lytal

Ph.D. Candidate

Statistics GIDP

 

2018 Joint Statistical Meetings

Vancouver, BC, Canada

July 28-August 2, 2018

 

Through gene sequencing experiments, researchers can analyze the genetic content of tumors or developing embryos and better understand the importance of particular genes during stages of development. Single-cell RNA-sequencing (scRNA-seq) provides a means to assess transcriptomic variations among individual cells, rather than over the tumor as a whole, giving an advantage over bulk sequencing methods that fail to detect subgroups and rare cell types.

However, restrictions such as amplification bias, technical noise, and dropout events often limit the power of scRNA-seq results. To address these issues, various normalization methods have been developed that correct observed gene counts to account for existing noise and more accurately represent the true biological signal of interest. Eliminating technical noise and amplification error often involves the use of a set of exogenous genes injected into the cell in known quantities, referred to as “spike-in genes”. By statistically modeling the difference between observed gene counts and known gene counts, the resulting model can then apply to all other genes present in the cell, adjusting observed gene counts accordingly. We propose a novel scRNA-seq normalization method that normalizes both within and between a data set’s groups while also using dropout imputation to adjust for missing values. We compare this method with existing spike-in approaches, using real data sets to support our results.

 

Abstract for Lay Audience

Bulk cell RNA-sequencing works on many cells at one time to determine what type and amount of genes are in a group of similar cells. This information has applications to biology and medicine, and can be used to identify the function and type of cells, including new or rare ones that could influence knowledge on human health or disease.

Single cell RNA-sequencing is relatively new, and instead analyzes single cells at a time to obtain similar information. The difference is that it is more effective than bulk sequencing at finding rare cell types that would be harder to distinguish when sequencing many cells at one time. It has potential to find subtle differences between seemingly identical cells, distinguish embryonic cells in different states of development, and even identify cells in cancer patients that pose a risk of forming tumors.

However, working with such a small amount of RNA makes it more difficult to determine gene counts. Inaccuracies in the technology used to sort out types and amounts of genes are more pronounced. The process of “normalization” adjusts for these inaccuracies, allowing for more accurate measurements and more solid conclusions about the nature of these cells. Several different approaches to this process exist or are in development. In my oral presentation, I speak about existing methods to adjust for inaccuracies and present my own alternative that I have been pursuing.