Huber Group

Quantitative Biology and Statistics

The Huber group develops statistical methods for modern biotechnologies, applies them to biological discovery, and translates them into reusable tools.

Previous and current research

The Huber group studies biological systems by developing statistical and bioinformatic methods for the analysis of new data types and large systematic datasets: single-cell profiling, multi-omics, high-throughput drug- or CRISPR-based perturbation assays, and quantitative imaging. Our projects range from applied data analysis for biological discovery to theoretical method development. Our biological systems of interest from fundamental models of tissue biology to blood cancers. We maintain an extensive network of collaborations. These include the Molecular Medicine Partnership Unit (MMPU) ‘Systems Medicine of Cancer Drugs’, the ERC Synergy project DECODE, the ELLIS unit Heidelberg, and our contributions to the Bioconductor project.

We develop computational methods needed to master big data sets, and we address scientific questions in fundamental biology and precision medicine. We employ statistics and machine learning to discover patterns in data, understand mechanisms, and to build and investigate models. The interdisciplinary team comprises researchers from quantitative disciplines – mathematics, statistics, physics and computer science – and different fields of biology and medicine. Our work pursues three principal aims:

  1. To develop and improve new data generating technologies in biological research by powering them with the best statistical methods. This includes inference – reasoning with uncertainty, making optimal decisions based on incomplete, noisy or overwhelming data – as well as data exploration, visualization and discovery – helping scientists examine large, complex datasets that they could not grasp otherwise.
  2. To make biological discoveries on drug–gene–environment interaction networks and their dynamical and context-dependent outcome in phenotypes, in basic biological model systems as well as in precision oncology applications. This includes the use of high-throughput perturbation data, single-cell omics, multimodal omics and imaging.
  3. To make statistical methods more widely usable, not only for experts, but for the range of natural scientists. This aim is embodied by our engagement in open source, open science and the Bioconductor project.

Functional precision medicine

Genomics and other molecular profiling technologies have produced increasingly detailed biology-based understanding of human health and disease. The next challenge is using this knowledge to engineer treatments and cures. To this end, we integrate observational data, such as from large-scale sequencing and molecular profiling, with interventional data, such as from systematic genetic or chemical screens, to reconstruct a fuller picture of the underlying causal relationships and actionable intervention points. A fascinating example is our collaboration on molecular mechanisms of individual sensitivity and resistance of tumours to treatments in our precision oncology project together with Thorsten Zenz at University Hospital Zurich and Sascha Dietrich at University Hospital Heidelberg.

Open science

As we engage with new data types, our aim is to develop high-quality computational methods of wide applicability. We consider the release and maintenance of scientific software an integral part of doing science, and we contribute to the Bioconductor Project, an open source software collaboration to provide tools for the analysis and understanding of genome-scale data. An example is our DESeq2 package for analysing count data from high-throughput sequencing.

scientific diagram
Two-dimensional t-SNE representation of single cell protein expression profiles (33 CyTOF markers) of 64,000 cells computationally pooled from 16 tumours each under four different ex-vivo treatment conditions. In Panel a, the colouring indicates inferred cell type. In Panel B, the same layout is used to show the data from the CpG ODN treatment condition, coloured by intensity of the proliferation marker Ki-67, and separately for the tumours from two newly discovered tumour subtypes, CLL-PD low and high. Figure from Lu, Cannizzaro et al., Multi-omics reveals clinically relevant proliferative drive associated with mTOR-MYC-OXPHOS activity in chronic lymphocytic leukaemia. Nature Cancer (2021).

Future projects and goals

We aim to enable exploitation of new data types and new types of experiments and studies by developing the computational techniques needed to turn raw data into biology.

  • Single-cell ‘omics in space and time: finding low-dimensional explanations (factors, gradients, clusters, trees and networks) of high-dimensional data, using combinations of supervised and unsupervised learning.
  • Quantitative proteomics in cancer research.
  • Converting images of cells and tissues into quantitative data and models.
  • Multidimensional phenotyping of genetic and drug-based perturbation assays to map context-dependent gene-gene and gene-drug interaction networks.
  • Many powerful mathematical and computational ideas exist but are difficult to access. We aim to translate them into practical methods and software that make a real difference to biomedical researchers. We sometimes term this approach ‘Translational Statistics’.


ми з україною 🇺🇦 We stand with Ukraine

Comparison of the effects of different data transformations on single-cell RNA-Seq count data.
Figure from Ahlmann-Eltze and Huber, Transformation and Preprocessing of Single-Cell RNA-Seq Data. bioRxiv.
Ternary plots of relative sensitivities to targeted kinase inhibitors for a cohort of primary tumour samples of chronic lymphocytic leukaemia (CLL). DOI