$ cat ~/about.md
About
I'm a Computational Scientist at Altos Labs, where I architect fully automated multi-omics platforms and develop causal-inference frameworks for aging research. My work spans single-cell genomics, spatial transcriptomics, and large-scale data integration — building the computational tools that turn raw sequencing data into biological insight.
I created the alevin-fry ecosystem for single-cell RNA-seq preprocessing (published in Nature Methods) and am a core contributor to the nf-core community, leading development of pipelines for spatial transcriptomics, proteomics, and single-cell analysis used by researchers worldwide.
$ git log --oneline --graph --all
msproteomics — nf-core proteomics pipeline
Creator of the official nf-core end-to-end pipeline for quantitative mass spectrometry-based proteomics. Automated protein identification, quantification, differential abundance analysis, and pathway enrichment.
spatialxe — nf-core spatial transcriptomics pipeline
Lead developer of the official nf-core pipeline for spatial transcriptomics analysis, supporting multiple technologies and data formats. Standardized workflows for cell segmentation, transcript assignment, and spatial feature extraction.
Forseti: A mechanistic and predictive model of the splicing status of scRNA-seq reads
Dongze He*, Yuan Gao, Spencer Skylar Chan, Natalia Quintana-Parrilla, and Rob Patro*
ISMB 2024
Designed the first statistical model to predict scRNA-seq read splicing status, achieving AUROC 0.9.
Joined Altos Labs as Computational Scientist
Architecting fully automated multi-omics platforms (Flyte + Nextflow + AWS) and deploying ReAct-based AI dashboards to accelerate scientific hypothesis generation.
Ph.D. in Biological Sciences — University of Maryland
Methods for efficient processing and comprehensive analysis of single-cell sequencing data. Advisor: Dr. Rob Patro. Concentration: Computational Biology and Bioinformatics.
DifferentialRegulation: a Bayesian hierarchical approach to identify differentially regulated genes
Simone Tiberi, Joël Meili, Peiying Cai, Charlotte Soneson, Dongze He, Hirak Sarkar, Alejandra Avalos-Pacheco, Rob Patro, and Mark D Robinson*
Biostatistics 25(4), 1079-1093
A novel statistical framework for identifying genes with differential regulation across conditions in single-cell RNA-seq data.
scrnaseq — nf-core single-cell RNA-seq pipeline
Core contributor to the official nf-core pipeline for single-cell RNA-seq preprocessing. Integrated the alevin-fry/simpleaf workflow as a primary analysis path alongside STARSolo, Kallisto/BUStools, and Cell Ranger.
Best practices for single-cell analysis across modalities
L Heumos et al. (44 authors including Dongze He)
Nature Reviews Genetics 24, 550–572
A comprehensive review establishing community standards for single-cell data analysis across RNA, ATAC, protein, and spatial modalities.
simpleaf: A simple, flexible, and scalable framework for single-cell transcriptomics data processing
Dongze He*, and Rob Patro*
Bioinformatics 39
A simplified interface for the alevin-fry ecosystem that can process complex single-cell data types with one command, including CITE-seq and 10X feature barcoding.
Deep Learning Motif Discovery for scATAC-seq — Genentech
Improved a deep-learning-based Motif Discovery Analysis framework for scATAC-seq, enabling the first cell-type-specific MDA at Genentech. Discovered regulatory targets through integrated MDA and gene-regulatory-network analysis.
Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data
Dongze He, Mohsen Zakeri, Hirak Sarkar, Charlotte Soneson, Avi Srivastava, and Rob Patro*
Nature Methods 19, 316–322
A suite of tools achieving 2x faster performance and dramatically lower memory usage for single-cell RNA-seq preprocessing while maintaining state-of-the-art accuracy.
M.S. in Systems Biology and Bioinformatics — Case Western Reserve
Discovery of Causal Regulatory Network of System Level Measurements by Integrative Network Analysis.
B.S. in Biotechnology — Huaqiao University
Foundation in molecular biology, genetics, and biochemistry in Xiamen, China.
$ ls ~/blog/
Blog Posts
Jan 24, 2023
Generating a scRNA-seq count matrix with simpleaf
A step-by-step tutorial on generating gene count matrices from raw FASTQ files using simpleaf and the alevin-fry pipeline, covering index building, quantification, and loading results in R and Python.
Coming soon
Building an end-to-end nf-core pipeline for proteomics
Design decisions and lessons learned from creating msproteomics, an nf-core pipeline for quantitative mass spectrometry-based proteomics data analysis.
Coming soon
Multi-omics integration with graph convolutional networks
Exploring causal-inference frameworks that leverage graph neural networks and linear programming to integrate multi-omics modalities for biomarker discovery.