Sunday, February 18, 2024

Digital Biology: "Deep learning for single-cell sequencing: a microscope to see the diversity of cells"

From The Gradient, January 24:

The history of each living being is written in its genome, which is stored as DNA and present in nearly every cell of the body. No two cells are the same, even if they share the same DNA and cell type, as they still differ in the regulators that control how DNA is expressed by the cell. The human genome consists of 3 billion base pairs spread over 23 chromosomes. Within this vast genetic code, there are approximately 20,000 to 25,000 genes, constituting the protein-coding DNA and accounting for about 1% of the total genome [1]. To explore the functioning of complex systems in our bodies, especially this small coding portion of DNA, a precise sequencing method is necessary, and single-cell sequencing (sc-seq) technology fits this purpose.

In 2013, Nature selected single-cell RNA sequencing as the Method of the Year [2] (Figure 3), highlighting the importance of this method for exploring cellular heterogeneity through the sequencing of DNA and RNA at the individual cell level. Subsequently, numerous tools have emerged for the analysis of single-cell RNA sequencing data. For example, the scRNA-tools database has been compiling software for the analysis of single-cell RNA data since 2016, and by 2021, the database includes over 1000 tools [3]. Among these tools, many involve methods that leverage Deep Learning techniques, which will be the focus of this article – we will explore the pivotal role that Deep Learning, in particular, has played as a key enabler for advancing single-cell sequencing technologies.

Background
Flow of genetic information from DNA to protein in cells

Let’s first go over what exactly cells and sequences are. The cell is the fundamental unit of our bodies and the key to understanding how our bodies function in good health and how molecular dysfunction leads to disease. Our bodies are made of trillions of cells, and nearly every cell contains three genetic information layers: DNA, RNA, and protein. DNA is a long molecule containing the genetic code that makes each person unique. Like a source code, it includes several instructions showing how to make each protein in our bodies. These proteins are the workhorses of the cell that carry out nearly every task necessary for cellular life. For example, the enzymes that catalyze chemical reactions within the cell and DNA polymerases that contribute to DNA replication during cell division, are all proteins. The cell synthesizes proteins in two steps: Transcription and Translation (Figure 1), which are known as gene expression. DNA is first transcribed into RNA, then RNA is translated into protein. We can consider RNA as a messenger between DNA and protein.

Figure 1. The central dogma of biology

While the cells of our body share the same DNA, they vary in their biological activity. For instance, the distinctions between immune cells and heart cells are determined by the genes that are either activated or deactivated in these cells. Generally, when a gene is activated, it leads to the creation of more RNA copies, resulting in increased protein production. Therefore, as cell types differ based on the quantity and type of RNA/protein molecules synthesized, it becomes intriguing to assess the abundance of these molecules at the single-cell level. This will enable us to investigate the behavior of our DNA  within each cell and attain a high-resolution perspective of the various parts of our bodies.

In general, all single-cell sequencing technologies can be divided into three main steps:

  1. Isolation of single cells from the tissue of interest and extraction of genetic material from each isolated cell
  2. Amplification of genetic material from each isolated cell and library preparation
  3. Sequencing of the library using a next-generation sequencer and data analysis

Navigating through the intricate steps of cellular biology and single-cell sequencing technologies, a pivotal question emerges: How is single-cell sequencing data represented numerically?

Structure of single-cell sequencing data....

....MUCH MORE

Her school runs the fastest supercomputer in Africa:

Morocco's UM6P university launches Africa’s most powerful supercomputer

At this point I will insert the old traders' dictum: "Pay attention or pay the offer!"

If interested see also: