Digital Biology: "Deep learning for single-cell sequencing: a microscope to see the diversity of cells"
From The Gradient, January 24:
The history of each living being is written in its genome, which is
stored as DNA and present in nearly every cell of the body. No two cells
are the same, even if they share the same DNA and cell type, as they
still differ in the regulators that control how DNA is expressed by the
cell. The human genome consists of 3 billion base pairs spread over 23
chromosomes. Within this vast genetic code, there are approximately
20,000 to 25,000 genes, constituting the protein-coding DNA and
accounting for about 1% of the total genome [1]. To explore the
functioning of complex systems in our bodies, especially this small
coding portion of DNA, a precise sequencing method is necessary, and
single-cell sequencing (sc-seq) technology fits this purpose.
In
2013, Nature selected single-cell RNA sequencing as the Method of the
Year [2] (Figure 3), highlighting the importance of this method for
exploring cellular heterogeneity through the sequencing of DNA and RNA
at the individual cell level. Subsequently, numerous tools have emerged
for the analysis of single-cell RNA sequencing data. For example, the
scRNA-tools database has been compiling software for the analysis of
single-cell RNA data since 2016, and by 2021, the database includes over
1000 tools [3]. Among these tools, many involve methods that leverage
Deep Learning techniques, which will be the focus of this article – we
will explore the pivotal role that Deep Learning, in particular, has
played as a key enabler for advancing single-cell sequencing
technologies.
Background Flow of genetic information from DNA to protein in cells
Let’s first go over what exactly cells and sequences are.The
cell is the fundamental unit of our bodies and the key to understanding
how our bodies function in good health and how molecular dysfunction
leads to disease. Our bodies are made of trillions of cells, and nearly
every cell contains three genetic information layers: DNA, RNA, and
protein. DNA is a long molecule containing the genetic code that makes
each person unique. Like a source code, it includes several instructions
showing how to make each protein in our bodies. These proteins are the
workhorses of the cell that carry out nearly every task necessary for
cellular life. For example, the enzymes that catalyze chemical reactions
within the cell and DNA polymerases that contribute to DNA replication
during cell division, are all proteins. The cell synthesizes proteins in
two steps: Transcription and Translation (Figure 1), which are known as
gene expression. DNA is first transcribed into RNA, then RNA is
translated into protein. We can consider RNA as a messenger between DNA
and protein.
Figure 1. The central dogma of biology
While
the cells of our body share the same DNA, they vary in their biological
activity. For instance, the distinctions between immune cells and heart
cells are determined by the genes that are either activated or
deactivated in these cells. Generally, when a gene is activated, it
leads to the creation of more RNA copies, resulting in increased protein
production. Therefore, as cell types differ based on the quantity and
type of RNA/protein molecules synthesized, it becomes intriguing to
assess the abundance of these molecules at the single-cell level. This
will enable us to investigate the behavior of our DNA within each cell
and attain a high-resolution perspective of the various parts of our
bodies.
In general, all single-cell sequencing technologies can be divided into three main steps:
Isolation of single cells from the tissue of interest and extraction of genetic material from each isolated cell
Amplification of genetic material from each isolated cell and library preparation
Sequencing of the library using a next-generation sequencer and data analysis
Navigating
through the intricate steps of cellular biology and single-cell
sequencing technologies, a pivotal question emerges: How is single-cell
sequencing data represented numerically?