$ cat ~/about.md

About

I'm a Computational Scientist at Altos Labs, where I architect fully automated multi-omics platforms and develop causal-inference frameworks for aging research. My work spans single-cell genomics, spatial transcriptomics, and large-scale data integration — building the computational tools that turn raw sequencing data into biological insight.

I created the alevin-fry ecosystem for single-cell RNA-seq preprocessing (published in Nature Methods) and am a core contributor to the nf-core community, leading development of pipelines for spatial transcriptomics, proteomics, and single-cell analysis used by researchers worldwide.

Single-cell GenomicsSpatial TranscriptomicsMulti-omics IntegrationDeep LearningBioinformatics Pipelines

$ git log --oneline --graph --all

f1e4b7dMar 2026research

msproteomics — nf-core proteomics pipeline

Creator of the official nf-core end-to-end pipeline for quantitative mass spectrometry-based proteomics. Automated protein identification, quantification, differential abundance analysis, and pathway enrichment.

NextflowRnf-core
c2d5a9eMar 2026research

spatialxe — nf-core spatial transcriptomics pipeline

Lead developer of the official nf-core pipeline for spatial transcriptomics analysis, supporting multiple technologies and data formats. Standardized workflows for cell segmentation, transcript assignment, and spatial feature extraction.

NextflowPythonnf-core
a3f7c2dJul 2024publication

Forseti: A mechanistic and predictive model of the splicing status of scRNA-seq reads

Dongze He*, Yuan Gao, Spencer Skylar Chan, Natalia Quintana-Parrilla, and Rob Patro*

ISMB 2024

Designed the first statistical model to predict scRNA-seq read splicing status, achieving AUROC 0.9.

RustPythonStatistical Modeling
e7a1b3cJun 2024milestone

Joined Altos Labs as Computational Scientist

Architecting fully automated multi-omics platforms (Flyte + Nextflow + AWS) and deploying ReAct-based AI dashboards to accelerate scientific hypothesis generation.

NextflowFlyteAWSRShinyLangChain
d4f2e8aMay 2024education

Ph.D. in Biological Sciences — University of Maryland

Methods for efficient processing and comprehensive analysis of single-cell sequencing data. Advisor: Dr. Rob Patro. Concentration: Computational Biology and Bioinformatics.

RustPythonRC/C++
b8c3d1fJan 2024publication

DifferentialRegulation: a Bayesian hierarchical approach to identify differentially regulated genes

Simone Tiberi, Joël Meili, Peiying Cai, Charlotte Soneson, Dongze He, Hirak Sarkar, Alejandra Avalos-Pacheco, Rob Patro, and Mark D Robinson*

Biostatistics 25(4), 1079-1093

A novel statistical framework for identifying genes with differential regulation across conditions in single-cell RNA-seq data.

RBayesian Statistics
g3a8f1bMar 2025research

scrnaseq — nf-core single-cell RNA-seq pipeline

Core contributor to the official nf-core pipeline for single-cell RNA-seq preprocessing. Integrated the alevin-fry/simpleaf workflow as a primary analysis path alongside STARSolo, Kallisto/BUStools, and Cell Ranger.

NextflowRustnf-core
e9a2c4bJun 2023publication

Best practices for single-cell analysis across modalities

L Heumos et al. (44 authors including Dongze He)

Nature Reviews Genetics 24, 550–572

A comprehensive review establishing community standards for single-cell data analysis across RNA, ATAC, protein, and spatial modalities.

PythonRJupyter
d7b1f3eJan 2023publication

simpleaf: A simple, flexible, and scalable framework for single-cell transcriptomics data processing

Dongze He*, and Rob Patro*

Bioinformatics 39

A simplified interface for the alevin-fry ecosystem that can process complex single-cell data types with one command, including CITE-seq and 10X feature barcoding.

RustCLI
a1c8e2fSummer 2023research

Deep Learning Motif Discovery for scATAC-seq — Genentech

Improved a deep-learning-based Motif Discovery Analysis framework for scATAC-seq, enabling the first cell-type-specific MDA at Genentech. Discovered regulatory targets through integrated MDA and gene-regulatory-network analysis.

PyTorchNextflowPython
f4d6a2cMar 2022publication

Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data

Dongze He, Mohsen Zakeri, Hirak Sarkar, Charlotte Soneson, Avi Srivastava, and Rob Patro*

Nature Methods 19, 316–322

A suite of tools achieving 2x faster performance and dramatically lower memory usage for single-cell RNA-seq preprocessing while maintaining state-of-the-art accuracy.

RustPython
b3e7d1aMay 2019education

M.S. in Systems Biology and Bioinformatics — Case Western Reserve

Discovery of Causal Regulatory Network of System Level Measurements by Integrative Network Analysis.

RPythonNetwork Analysis
c9f2a4dMay 2017education

B.S. in Biotechnology — Huaqiao University

Foundation in molecular biology, genetics, and biochemistry in Xiamen, China.