Index

03-In-situ-spatial-cell_annotation

Statistical analysis

Crucial step of the statistical analysis process, cell labeling allow to associate each cell to a cell type. This process can be done manually using leiden or louvain clustering and gene markers definition using standard single-cell labeling process as such described within seurat or scanpy standard statistical workflows. Labeling can also be performed by cell scoring using the very specific gene markers selected within the spatial experiment gene panel using standard scoring functions based on gene lists. Here we describe automatic procedure that relies on the grouping into a same latent space of the spatial cells and cells coming from an home-made or external single-cell reference, based on the transcriptome of the common genes.

Module specifications

Input data	Computational environnement	Output data
SpatialData, AnnData	spatial in-situ imaging-based Python	SpatialData, AnnData

Code

Here we describe an automatic labeling procedure using scanvi and/or scMusketeers that relies on mixing spatial cells and a home-made or external single-cell reference cells into a unique latent space to predict spatial cells cell type. Obviously the single-cell reference needs to be as close as possible as the spatial dataset in term of cell type occurence.

scanvi automatic labeling

Home-made or external single-cell reference used for transfert labeling needs to be prepared before automatic labeling could be performed. Reference AnnData object needs to have a layer 'counts' filled with either raw or log-normalized counts and a cell type field (ClusterName for zeisel et al., 2018) in the .obs metadata. On bego, some references are already available for mouse brain dataset based on 2 references publications:
* Yao et al. (2023)
* Zeisel et al. (2018)

import scanpy as sc
import scispy as scis

# Yao et al. (2023) 10xGenomics  mouse brain 247 genes panel
scref_yao2023 = sc.read_h5ad('~/000-Atlas-references/mousebrain_247g_raw_4M_10p.h5ad')
scref_yao2023.layers['counts'] = scref_yao2023.X.copy()

# Zeisel et al. (2018) - whole CNS
scref_zeisel2018 = sc.read_h5ad('~/000-Atlas-references/zeisel_2018_CNS.h5ad')
# Zeisel et al. (2018) - coronal section
scref_zeisel2018 = sc.read_h5ad('~/000-Atlas-references/zeisel_2018_coronal.h5ad')
scref_zeisel2018.layers['counts'] = scref_zeisel2018.raw.X.copy()

# read your spatial dataset
ad6_harmony = sc.read_h5ad('000-outs/ad6.h5ad')
# transfert annotation
scis.pp.scvi_annotate(ad6_harmony, scref_zeisel2018, batch_size=127, label_ref='ClusterName', label_key='scanvi', metaref2add=['TaxonomyRank1','TaxonomyRank2','TaxonomyRank3','TaxonomyRank4','Region'])
ad6_harmony.var_names = ad6_harmony.raw.var_names

# save your annotate dataset
ad6_harmony.write('000-outs/ad6_scanvi.h5ad')

scMusketeers automatic labeling

import scanpy as sc

# parameters
class_key = "scmusk"
unlabeled_category = "Unknown"
batch_key = "sample"
h5ad_ref = "~/000-Atlas-references/zeisel_2018_coronal.h5ad"
h5ad_query = '000-outs/ad6_scanvi.h5ad'

# prepare ref
ref = sc.read_h5ad(h5ad_ref)
ref.obs = ref.obs.reset_index(drop=True)
ref.obs.index = ref.obs.index.map(str)
ref.X = ref.raw.X.copy()
ref.obs[class_key] = ref.obs['ClusterName']
# arrange a random batch
s = [random.randrange(1, 4) for _ in range(0, ref.shape[0])]
ref.obs[batch_key] = s
ref.obs[batch_key] = ref.obs[batch_key].map(str)
ref.var_names_make_unique()
ref.obs_names_make_unique()
ref.write('ref.h5ad')

# prepare query
query = sc.read_h5ad(h5ad_query)
query.obs = query.obs.reset_index(drop=True)
query.obs.index = query.obs.index.map(str)
query.X = query.layers['counts'].copy()
query.obs[class_key] = unlabeled_category
query.var_names_make_unique()
query.obs_names_make_unique()
query.write('query.h5ad')

scMusketeers need to be run as command line within a specific conda environnement on bego

conda activate /data/analysis/ML_models/conda_env/cb_scmusketeers
sc-musketeers transfer ref.h5ad --query_path query.h5ad --class_key=scmusk --unlabeled_category=Unknown --batch_key=sample --out_dir=. --out_name=scmusk.pred

Then get back to python notebook

import anndata as ad

pred = sc.read_h5ad("scmusk.pred.h5ad")
pred = pred[pred.obs.train_split == 'test']

pred.obs.index = query.obs.index

query.obs['scmusk'] = adata_pred.obs['scmusk_pred']
query.obs['scmusk'] = query.obs['scmusk'].astype("category")

query.write('000-outs/ad6_scanvi_scmusk.h5ad')

Visualize UMAP by cell type

sc.pl.embedding(query, "umap", color=['scmusk','scanvi'], legend_loc='on data', legend_fontsize=4)

Re-export metadata for xenium explorer

samples = pd.read_csv('samples.csv', sep=',')
for index, row in samples.iterrows():
    sdata = SpatialData.read("000-outs/SD_" + row['sample'])
    sub = query[query.obs['sample'] == row['sample']]

    sub.uns["spatialdata_attrs"] = sdata.table.uns["spatialdata_attrs"]
    sub.obs.cell_id = sub.obs.cell_id.astype(str)
    del sdata.tables["table"]
    sdata.tables["table"] = sub

    # xenium case
    spatialdata_xenium_explorer.write("000-outs/XE_" + row['sample'], sdata, shapes_key='cell_boundaries', points_key='transcripts', image_key='morphology_focus', polygon_max_vertices=40, mode='+o')