Slide-SeqV2 MOB Clustering Analysis
4:Slide-seqV2_Clustering (MOB)¶
HiSTaR can process Slide-SeqV2 data to extract low-dimension representation. In this tutorial, we use MOB dataset to introduce the analysis.
The raw MOB dataset can be downloaded from:https://singlecell.broadinstitute.org/single_cell/study/SCP815/highly-sensitive-spatial-transcriptomics-at-near-cellular-resolution-with-slide-seqv2#study-summary
barcodes data comes from STAGATE https://drive.google.com/drive/folders/10lhz5VY7YfvHrtV40MwaqLmWz56U9eBP?usp=sharing.
The complete experimental dataset is available here https://zenodo.org/records/15599070
Preparation¶
In [1]:
import pandas as pd
import scanpy as sc
import matplotlib.pyplot as plt
import os
In [2]:
import torch
import HiSTaR
HiSTaR.fix_seed(2023)
In [3]:
input_dir = '../data/Slide-seq'
counts_file = os.path.join(input_dir, 'Puck_200127_15.digital_expression.txt')
coor_file = os.path.join(input_dir, 'Puck_200127_15_bead_locations.csv')
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
In [4]:
counts = pd.read_csv(counts_file, sep='\t', index_col=0)
coor_df = pd.read_csv(coor_file, index_col="barcode")
In [5]:
adata = sc.AnnData(counts.T)
adata.var_names_make_unique()
coor_df = coor_df.loc[adata.obs_names, ['xcoord', 'ycoord']]
adata.obsm["spatial"] = coor_df.to_numpy()
In [6]:
sc.pp.calculate_qc_metrics(adata, inplace=True)
In [7]:
used_barcode = pd.read_csv('../data/Slide-seq/used_barcodes.txt', sep='\t', header=None)
used_barcode = used_barcode[0]
adata = adata[used_barcode,]
In [8]:
sc.pp.filter_genes(adata, min_cells=50)
D:\anaonda3\envs\ST_pytorch\Lib\site-packages\scanpy\preprocessing\_simple.py:287: ImplicitModificationWarning: Trying to modify attribute `.var` of view, initializing view as actual. adata.var["n_cells"] = number
In [9]:
#Normalization
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=2000)
sc.pp.log1p(adata)
from sklearn.decomposition import PCA
adata_X = PCA(n_components=200, random_state=42).fit_transform(adata.X)
adata.obsm['X_pca'] = adata_X
D:\anaonda3\envs\ST_pytorch\Lib\site-packages\scanpy\preprocessing\_highly_variable_genes.py:74: UserWarning: `flavor='seurat_v3'` expects raw count data, but non-integers were found. warnings.warn(
Constructing the spatial network¶
In [10]:
# construct graph
graph_dict = HiSTaR.graph_construction(adata, 12)
E:\HDVAE\HiSTaR\graph_func.py:37: UserWarning: torch.sparse.SparseTensor(indices, values, shape, *, device=) is deprecated. Please use torch.sparse_coo_tensor(indices, values, shape, dtype=, device=). (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\pytorch\torch\csrc\utils\tensor_new.cpp:646.) return torch.sparse.FloatTensor(indices, values, shape)
In [11]:
adata
Out[11]:
AnnData object with n_obs × n_vars = 20139 × 11750
obs: 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes'
var: 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
uns: 'hvg', 'log1p'
obsm: 'spatial', 'X_pca'
Running HiSTaR¶
In [12]:
histar_net = HiSTaR.histar(adata.obsm['X_pca'], graph_dict, device=device, gcn_hidden2=12, lambda_sim=0.38)
histar_net.train()
histar_feat, _, _, _ = histar_net.process()
adata.obsm['HiSTaR'] = histar_feat
E:\HDVAE\HiSTaR\HiSTaR_model.py:75: UserWarning: torch.range is deprecated and will be removed in a future release because its behavior is inconsistent with Python's range builtin. Instead, use torch.arange, which produces values in [start, end). total_idx = torch.range(0, self.cell_num - 1, dtype=torch.float64).to(self.device) E:\HDVAE\HiSTaR\HiSTaR_model.py:192: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). loss_kl = F.kl_div(out_q.log(), torch.tensor(tmp_p).to(self.device)).to(self.device) D:\anaonda3\envs\ST_pytorch\Lib\site-packages\torch\nn\functional.py:3384: UserWarning: reduction: 'mean' divides the total loss by both the batch size and the support size.'batchmean' divides only by the batch size, and aligns with the KL div math definition.'mean' will be changed to behave the same as 'batchmean' in the next major release. warnings.warn(
Clustering¶
In [13]:
#HiSTaR.configure_r_environment() # If you encounter problems loading R packages, you can manually configure your path in this function.
In [14]:
n_clusters=11
HiSTaR.mclust_R(adata, n_clusters=n_clusters, use_rep='HiSTaR', key_added='HiSTaR_cluster')
R[write to console]: __ __
____ ___ _____/ /_ _______/ /_
/ __ `__ \/ ___/ / / / / ___/ __/
/ / / / / / /__/ / /_/ (__ ) /_
/_/ /_/ /_/\___/_/\__,_/____/\__/ version 6.1.1
Type 'citation("mclust")' for citing this R package in publications.
fitting ... |======================================================================| 100%
Out[14]:
AnnData object with n_obs × n_vars = 20139 × 11750
obs: 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'HiSTaR_cluster'
var: 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
uns: 'hvg', 'log1p'
obsm: 'spatial', 'X_pca', 'HiSTaR'
Visualization¶
In [15]:
adata.obsm["spatial"] = adata.obsm["spatial"] * (-1)
In [16]:
import matplotlib as mpl
mpl.rcParams.update({
'font.family': 'Arial',
'axes.labelweight': 'bold',
'axes.titleweight': 'bold',
'axes.titlesize': 12,
'figure.constrained_layout.use': True
})
mclust_palette = [
"#F3766E", "#5BB300", "#2E96FF", "#C655D9", "#FFB549",
"#00C6EA", "#9B8500", "#009E73", "#FF6EB4", "#7AFF33",
"#A837D8", "#FFD700", "#00FF99", "#FF7F50", "#8A2BE2",
"#4B0082", "#FF1493", "#32CD32", "#8B4513"
]
In [17]:
plt.figure(figsize=(4, 3), facecolor='white')
sc.pl.embedding(
adata,
basis="spatial",
color="HiSTaR_cluster",
palette=mclust_palette,
s=12,
title=f"HiSTaR",
show=False,
frameon=False
)
ax = plt.gca()
ax.set_facecolor('white')
for spine in ax.spines.values():
spine.set_visible(False)
ax.set_xticks([])
ax.set_yticks([])
plt.tight_layout()
# plt.savefig('Slide_seq_spatial.png',dpi=300, bbox_inches='tight')
C:\Users\Angle\AppData\Local\Temp\ipykernel_1680\428656984.py:18: UserWarning: The figure layout has changed to tight plt.tight_layout()
<Figure size 400x300 with 0 Axes>