Types of modules



Lipidomics Modules

MS_based_lipidomics-Peak_area_analysis

Lipidomics Bulk Statistical analysis

This module shows how to use Lipidr, a R/Bioconductor package designed for the analysis of quantitative MS-based lipidomics datasets. First, sample information and lipid quantification matrix are processed to build a LipidExperiment object that integrates lipid annotations. After quality control checks, the data are normalized using PQN to ensure balanced peak area across samples. Sample correlations and PCA are used to explore relationships between samples. Finaly OPLS-DA and differential analysis based on limma are performed to identify differentially expressed lipids between conditions, which are represented in heatmaps and volcano plots.

Proteomics Modules

MS_based_proteomics-LFQ_data_analysis

Proteomics Bulk Statistical analysis

Differential analysis of MS-data to identify biomarkers or to understand biology is a cornerstone in proteomics. DEqMS is a robust method for analysis of both labelled and label-free MS-data. The method takes into account the inherent dependence of protein variance on the number of PSMs or peptides used for quantification, thereby providing a more accurate variance estimation and overall a better statistical power for quantitative proteomics analysis. This module shows how to perform DEqMS on LFQ intensities extracted from MaxQuant outputs, and how to run a quick Gene Set Enrichment Analysis to add some biological context to the list of differentially expressed proteins.

Transcriptomics Modules

Bulk_RNAseq-nfcore_pipeline

Transcriptomics Bulk Primary analysis

Primary analysis of RNA-seq data requires significant computing capacity due to the volume and complexity of the data. The process begins with pre-processing, where reads are quality-checked, trimmed to remove adapter sequences, and shortened at error-prone ends to facilitate alignment. Alignment involves mapping each read to the reference genome, which is computationally intensive due to the need to account for potential misreads, polymorphisms, and repeated regions. Different alignment algorithms, such as STAR for exact alignment and Kallisto or Salmon for faster but less precise alignment, have been developed to handle these challenges. After alignment, transcripts are quantified by counting reads overlapping exonic regions, with results presented as count matrices. The nf-core/rnaseq Nextflow pipeline integrates these tools into a reproducible workflow, enhancing standardization and version control. Nextflow facilitates portability through containerization systems like Docker and Singularity, ensuring that each tool runs in a specific version independently of the operating system. This pipeline is part of the nf-core project, which provides state-of-the-art, community-reviewed workflows that can be easily customized and run with a single command.

Ex_situ_ST-space_ranger

Transcriptomics Spatial Primary analysis

Space Ranger is the 10X Genomics pipeline designed to analyze spatial gene expression data generated by Visium systems. It begins with processing microscope images of tissue sections to ensure proper alignment and stitching. Next, it demultiplexes and aligns raw sequencing data to a reference genome, generating feature-barcode matrices. The software then performs counting of unique molecular identifiers (UMIs) associated with each gene in each spatial spot. Following this, the software maps the sequencing data to spatial coordinates on the tissue image. It ouputs a repertory notably containing the gene per spot counts matrix, the spatial positions, a web summary of QC metrics, a preliminary default statistical analysis, and a cloupe file for visualization in Loupe Browser. This module is usually performed by the platform that owns the 10x Genomics Visium system, such as UCAGenomiX.

In_situ_ST-output

Transcriptomics Spatial Primary analysis

Spatial in-situ imaging-based systems provide an output directory for each processed slides. Default systems parameters should provide data of suffisant quality,but possibilities exists to improve the quality of this primary analysis, prior to run statistical analysis modules. It will mainly depends on the in-situ imaging-based systems used.

scRNAseq-cell_ranger

Transcriptomics Single-cell Primary analysis

The primary analysis of raw sequencing files can be performed with several pipelines, notably the Cellranger pipeline developed by 10X Genomics. Cellranger uses predefined positions for adapter sequences, barcodes, and cDNA sequences, and employs the STAR alignment software to align reads to the reference genome. Reads that are not aligned or of poor quality are eliminated, and annotations are used to define exonic, intronic, and intergenic reads. The transcripts per cell quantification is based on counting Unique Molecular Identifiers (UMIs) for reads confidently aligned to the transcriptome and associated to a single cell barcode. This module is usually performed by the platform that owns the 10x Genomics Chromium system, such as UCAGenomiX.

In_situ_ST-read_and_integrate

Transcriptomics Spatial Secondary Analysis

Based on output directories created by in-situ imaging-based spatial transcriptomics systems this module allow to read the data and perform an integration using rapids_singlecell harmony implementation requiring access to GPU.

scRNAseq-integration_of_multiple_samples

Transcriptomics Single-cell Secondary Analysis

Sequencing multiple samples is essential for understanding tissue organization, experimental variations, and dynamic processes like treatment responses. Technical variabilities, known as "batch effects" can introduce unwanted biases in the data, potentially skewing biological conclusions. Successful data integration corrects these batch effects while preserving biological variability. This vignette will demonstrate the use of Harmony R package to integrate 18 Chromium samples.

Bulk_RNAseq-Differential_analysis

Transcriptomics Bulk Statistical analysis

Differential expression analysis involves measuring the difference in mean gene abundance between two groups and evaluating its statistical significance. In R, this is accomplished using established workflows like DESeq2 or edgeR, which model counts with a negative binomial distribution parameterized by normalization coefficients and corrected dispersion. These models account for experimental conditions and sample-specific covariates. Based on the modeled counts, a statistical test determines differential gene expression, with p-values adjusted for multiple comparisons and expression changes quantified by log2 fold change.

Ex_situ_ST-reference_free_celltype_deconvolution

Transcriptomics Spatial Statistical analysis

The contribution of cellular populations in each spot can be estimated using deconvolution methodes, which generally rely on a scRNA-seq gene expression profiles reference. However, such reference may not exist due to budgetary, technical, or biological limitations. Moreover, both single cell and spatial gene expression have their own bias limitating the extrapolation of dissociated data on spatial data. For these reasons, it may be useful to use a reference-free deconvolution tool, which infers modules of covarying genes and their contribution to each spot.

In_situ_ST-cell_annotation

Transcriptomics Spatial Statistical analysis

Crucial step of the statistical analysis process, cell labeling allow to associate each cell to a cell type. This process can be done manually using leiden or louvain clustering and gene markers definition using standard single-cell labeling process as such described within seurat or scanpy standard statistical workflows. Labeling can also be performed by cell scoring using the very specific gene markers selected within the spatial experiment gene panel using standard scoring functions based on gene lists. Here we describe automatic procedure that relies on the grouping into a same latent space of the spatial cells and cells coming from an home-made or external single-cell reference, based on the transcriptome of the common genes.

In_situ_ST-niches_and_domain

Transcriptomics Spatial Statistical analysis

Based on cell neighborhood and transcriptome proximity, several packages help to identifies niches and spatial domains to highlight structural organization of cells within the tissue. Niches are extremely interesting to define for instance heterogeneous immune cells organisation in cancer sample while spatial domain notion rely more on anatomical region. We focus here on 2 different packages (ie. cellcharter and novae) allowing the definition of both aspect adding metadata to AnnData .obs table. We also explain how to import anatomical regions coming from xenium explorer lasso shape .csv export.

scRNAseq-cell_annotation

Transcriptomics Single-cell Statistical analysis

Cell annotation is the process of assigning labels to the cell clusters, by linking their transcriptionnal signatures to the cell type, state, function, location or lineage they reflect. The main challenges of cell annotation are first to extract a relevant list of marker genes that are specifically characterising the expression profils, and then associate those markers to biological meaning based on empirical knowledge. As this task can be quite laborious, label transfer methods leveraging on already annotated datasets can be use to facilitate the process.

scRNAseq-differential_abundance_analysis

Transcriptomics Single-cell Statistical analysis

The aim of differential abundance (DA) analysis is to statistically compare the repartition of cell populations defined by clustering between conditions. Several methods are available, some inspired from flow cytometry analysis, others designed for scRNA-seq data. The DA analysis provided in this module is based on the NB GLM methods implemented in the edgeR package.

scRNAseq-differential_expression_analysis

Transcriptomics Single-cell Statistical analysis

Differential expression analysis (DEA) aims to quantify variations in gene expression between cell populations or between conditions. Basically, for each gene, it consists in measuring the difference in its mean abundance between 2 groups and assessing whether this difference is statistically credible by means of a hypothesis test. The DEA provided in this module is both based on two different approaches, whose use depends on the number of sequenced samples that can be considered biological replicates.