DLPFC Clustering Analysis
1: DLPFC_Clustering¶
HiSTaR can process 10X Visium data to extract low-dimension representation. In this tutorial, we use sample #151673 from Dorsolateral Prefrontal Cortex (DLPFC) dataset to introduce the analysis.
The raw DLPFC data can be downloaded from http://spatial.libd.org/spatialLIBD/
metadata comes from SEDR https://github.com/JinmiaoChenLab/SEDR_analyses
The complete experimental dataset is available here https://zenodo.org/records/15599070
In [1]:
import scanpy as sc
import pandas as pd
from sklearn import metrics
import torch
import matplotlib.pyplot as plt
from pathlib import Path
import warnings
import numpy as np
warnings.filterwarnings('ignore')
torch.set_default_dtype(torch.float64)
In [2]:
import HiSTaR
In [3]:
import matplotlib as mpl
mpl.rcParams.update({
'font.family': 'Arial',
'axes.labelweight': 'bold',
'axes.titleweight': 'bold',
'axes.titlesize': 12,
'axes.titlelocation': 'left',
'figure.constrained_layout.use': True,
'figure.dpi': 300,
'savefig.dpi': 300,
})
mclust_palette = [
"#F3766E", "#5BB300", "#2E96FF", "#C655D9", "#FFB549",
"#00C6EA", "#9B8500", "#009E73", "#FF6EB4", "#7AFF33",
"#A837D8", "#FFD700", "#00FF99", "#FF7F50", "#8A2BE2",
"#4B0082", "#FF1493", "#32CD32", "#8B4513"
]
In [4]:
random_seed = 2023
HiSTaR.fix_seed(random_seed)
In [5]:
# gpu
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
# path
data_root = Path('../data/DLPFC')
# sample name
sample_name = '151673'
n_clusters = 5 if sample_name in ['151669', '151670', '151671', '151672'] else 7
Loading data¶
In [6]:
adata = sc.read_visium(data_root / sample_name)
adata.var_names_make_unique()
df_meta = pd.read_csv(data_root / sample_name / 'metadata.tsv', sep='\t')
adata.obs['layer_guess'] = df_meta['layer_guess']
data Preprocessing¶
In [7]:
adata.layers['count'] = adata.X.toarray()
sc.pp.filter_genes(adata, min_cells=50)
sc.pp.filter_genes(adata, min_counts=10)
sc.pp.normalize_total(adata, target_sum=1e6)
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", layer='count', n_top_genes=2000)
adata = adata[:, adata.var['highly_variable'] == True]
sc.pp.scale(adata)
from sklearn.decomposition import PCA
adata_X = PCA(n_components=200, random_state=42).fit_transform(adata.X)
adata.obsm['X_pca'] = adata_X
Constructing neighborhood graph¶
In [8]:
graph_dict = HiSTaR.graph_construction(adata, 12)
print(graph_dict)
{'adj_norm': tensor(indices=tensor([[ 0, 0, 0, ..., 3638, 3638, 3638],
[ 0, 397, 485, ..., 3402, 3564, 3638]]),
values=tensor([0.0769, 0.0769, 0.0769, ..., 0.0769, 0.0769, 0.0769]),
size=(3639, 3639), nnz=48467, dtype=torch.float32,
layout=torch.sparse_coo), 'adj_label': tensor(indices=tensor([[ 0, 0, 0, ..., 3638, 3638, 3638],
[ 0, 397, 485, ..., 3402, 3564, 3638]]),
values=tensor([1., 1., 1., ..., 1., 1., 1.]),
size=(3639, 3639), nnz=48467, layout=torch.sparse_coo), 'norm_value': 0.5018367264030662}
Training HiSTaR¶
In [9]:
histar_net = HiSTaR.histar(adata.obsm['X_pca'], graph_dict, device=device, gcn_hidden2=12, lambda_sim=0.619)
histar_net.train()
histar_feat, _, _, _ = histar_net.process()
adata.obsm['HiSTaR'] = histar_feat
Set up the R environment and the mclust package.¶
In [10]:
HiSTaR.configure_r_environment() # If you encounter problems loading R packages, you can manually configure your path in this function.
Clustering¶
In [11]:
HiSTaR.mclust_R(adata, n_clusters, use_rep='HiSTaR', key_added='HiSTaR_clust')
R[write to console]: __ __
____ ___ _____/ /_ _______/ /_
/ __ `__ \/ ___/ / / / / ___/ __/
/ / / / / / /__/ / /_/ (__ ) /_
/_/ /_/ /_/\___/_/\__,_/____/\__/ version 6.1.1
Type 'citation("mclust")' for citing this R package in publications.
fitting ... |======================================================================| 100%
Out[11]:
AnnData object with n_obs × n_vars = 3639 × 2000
obs: 'in_tissue', 'array_row', 'array_col', 'layer_guess', 'HiSTaR_clust'
var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'n_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'mean', 'std'
uns: 'spatial', 'hvg'
obsm: 'spatial', 'X_pca', 'HiSTaR'
layers: 'count'
Visualization¶
Spatial domain¶
In [12]:
sub_adata = adata[~pd.isnull(adata.obs['layer_guess'])]
ARI = metrics.adjusted_rand_score(sub_adata.obs['layer_guess'], sub_adata.obs['HiSTaR_clust'])
NMI = metrics.normalized_mutual_info_score(sub_adata.obs['layer_guess'], sub_adata.obs['HiSTaR_clust'])
FMS = metrics.fowlkes_mallows_score(sub_adata.obs['layer_guess'], sub_adata.obs['HiSTaR_clust'])
print("ARI:", ARI , "NMI:", NMI, "FMS:", FMS)
ARI: 0.7181548924986987 NMI: 0.753457100987869 FMS: 0.7735671814491102
In [13]:
fig3, ax3 = plt.subplots(1, 2, figsize=(10, 4))
sc.pl.spatial(
adata,
color="layer_guess",
title="Manual Annotation",
palette=mclust_palette,
ax=ax3[0],
spot_size=100,
frameon=False,
img_key=None, # 禁用原始切片
show=False,
s=70
)
sc.pl.spatial(
adata,
color="HiSTaR_clust",
title=f"HiSTaR (ARI={ARI:.4f})",
palette=mclust_palette,
ax=ax3[1],
spot_size=100,
frameon=False,
img_key=None, # 禁用原始切片
show=False,
s=70
)
fig3.tight_layout()
plt.show()
UMAP¶
In [14]:
sc.pp.neighbors(adata, use_rep='HiSTaR', metric='cosine')
sc.tl.umap(adata)
In [15]:
sc.pl.umap(
adata,
color="HiSTaR_clust",
title=f"HiSTaR (ARI={ARI:.4f})",
palette=mclust_palette[:7], # 使用前7个颜色
frameon=False,
show=False
)
plt.show()
PAGA¶
In [16]:
clusters = sorted(adata.obs['HiSTaR_clust'].unique())
color_dict = {cluster: mclust_palette[i] for i, cluster in enumerate(clusters)}
sc.tl.paga(adata, groups='HiSTaR_clust')
mean_positions = np.zeros((len(clusters), 2))
for i, cluster in enumerate(clusters):
mask = adata.obs['HiSTaR_clust'] == cluster
mean_positions[i] = np.mean(adata.obsm['X_umap'][mask], axis=0)
fig, ax = plt.subplots(figsize=(10, 8), facecolor='none', edgecolor='none')
ax.set_facecolor('none')
sc.pl.paga(
adata,
color='HiSTaR_clust',
show=False,
frameon=False,
edge_width_scale=2,
node_size_scale=5,
pos=mean_positions,
ax=ax,
)
# plt.savefig('paga_plot.png',
# transparent=True,
# bbox_inches='tight',
# dpi=300)
plt.show()
# 打印调试信息
print("Cluster order:", clusters)
print("Color mapping:", color_dict)
Cluster order: [1, 2, 3, 4, 5, 6, 7]
Color mapping: {1: '#F3766E', 2: '#5BB300', 3: '#2E96FF', 4: '#C655D9', 5: '#FFB549', 6: '#00C6EA', 7: '#9B8500'}