DLPFC Clustering Analysis

1_DLPFC_Clustering

1: DLPFC_Clustering¶

HiSTaR can process 10X Visium data to extract low-dimension representation. In this tutorial, we use sample #151673 from Dorsolateral Prefrontal Cortex (DLPFC) dataset to introduce the analysis.
The raw DLPFC data can be downloaded from http://spatial.libd.org/spatialLIBD/
metadata comes from SEDR https://github.com/JinmiaoChenLab/SEDR_analyses
The complete experimental dataset is available here https://zenodo.org/records/15599070

In [1]:

import scanpy as sc
import pandas as pd
from sklearn import metrics
import torch
import matplotlib.pyplot as plt
from pathlib import Path
import warnings
import numpy as np
warnings.filterwarnings('ignore')
torch.set_default_dtype(torch.float64)

In [2]:

import HiSTaR

In [3]:

import matplotlib as mpl
mpl.rcParams.update({
    'font.family': 'Arial',              
    'axes.labelweight': 'bold',          
    'axes.titleweight': 'bold',          
    'axes.titlesize': 12,                
    'axes.titlelocation': 'left',        
    'figure.constrained_layout.use': True,
    'figure.dpi': 300,                   
    'savefig.dpi': 300,
})

mclust_palette = [
    "#F3766E", "#5BB300", "#2E96FF", "#C655D9", "#FFB549",
    "#00C6EA", "#9B8500", "#009E73", "#FF6EB4", "#7AFF33",
    "#A837D8", "#FFD700", "#00FF99", "#FF7F50", "#8A2BE2",
    "#4B0082", "#FF1493", "#32CD32", "#8B4513"
]

In [4]:

random_seed = 2023
HiSTaR.fix_seed(random_seed)

In [5]:

# gpu
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'

# path
data_root = Path('../data/DLPFC')

# sample name
sample_name = '151673'
n_clusters = 5 if sample_name in ['151669', '151670', '151671', '151672'] else 7

Loading data¶

In [6]:

adata = sc.read_visium(data_root / sample_name)
adata.var_names_make_unique()

df_meta = pd.read_csv(data_root / sample_name / 'metadata.tsv', sep='\t')
adata.obs['layer_guess'] = df_meta['layer_guess']

data Preprocessing¶

In [7]:

adata.layers['count'] = adata.X.toarray()
sc.pp.filter_genes(adata, min_cells=50)
sc.pp.filter_genes(adata, min_counts=10)
sc.pp.normalize_total(adata, target_sum=1e6)
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", layer='count', n_top_genes=2000)
adata = adata[:, adata.var['highly_variable'] == True]
sc.pp.scale(adata)

from sklearn.decomposition import PCA
adata_X = PCA(n_components=200, random_state=42).fit_transform(adata.X)
adata.obsm['X_pca'] = adata_X

Constructing neighborhood graph¶

In [8]:

graph_dict = HiSTaR.graph_construction(adata, 12)
print(graph_dict)

{'adj_norm': tensor(indices=tensor([[   0,    0,    0,  ..., 3638, 3638, 3638],
                       [   0,  397,  485,  ..., 3402, 3564, 3638]]),
       values=tensor([0.0769, 0.0769, 0.0769,  ..., 0.0769, 0.0769, 0.0769]),
       size=(3639, 3639), nnz=48467, dtype=torch.float32,
       layout=torch.sparse_coo), 'adj_label': tensor(indices=tensor([[   0,    0,    0,  ..., 3638, 3638, 3638],
                       [   0,  397,  485,  ..., 3402, 3564, 3638]]),
       values=tensor([1., 1., 1.,  ..., 1., 1., 1.]),
       size=(3639, 3639), nnz=48467, layout=torch.sparse_coo), 'norm_value': 0.5018367264030662}

Training HiSTaR¶

In [9]:

histar_net = HiSTaR.histar(adata.obsm['X_pca'], graph_dict, device=device, gcn_hidden2=12, lambda_sim=0.619)
histar_net.train()
histar_feat, _, _, _ = histar_net.process()
adata.obsm['HiSTaR'] = histar_feat

Set up the R environment and the mclust package.¶

In [10]:

HiSTaR.configure_r_environment()   # If you encounter problems loading R packages, you can manually configure your path in this function.

Clustering¶

In [11]:

HiSTaR.mclust_R(adata, n_clusters, use_rep='HiSTaR', key_added='HiSTaR_clust')

R[write to console]:                    __           __ 
   ____ ___  _____/ /_  _______/ /_
  / __ `__ \/ ___/ / / / / ___/ __/
 / / / / / / /__/ / /_/ (__  ) /_  
/_/ /_/ /_/\___/_/\__,_/____/\__/   version 6.1.1
Type 'citation("mclust")' for citing this R package in publications.

fitting ...
  |======================================================================| 100%

Out[11]:

AnnData object with n_obs × n_vars = 3639 × 2000
    obs: 'in_tissue', 'array_row', 'array_col', 'layer_guess', 'HiSTaR_clust'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'n_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'mean', 'std'
    uns: 'spatial', 'hvg'
    obsm: 'spatial', 'X_pca', 'HiSTaR'
    layers: 'count'

Visualization¶

Spatial domain¶

In [12]:

sub_adata = adata[~pd.isnull(adata.obs['layer_guess'])]
ARI = metrics.adjusted_rand_score(sub_adata.obs['layer_guess'], sub_adata.obs['HiSTaR_clust'])
NMI = metrics.normalized_mutual_info_score(sub_adata.obs['layer_guess'], sub_adata.obs['HiSTaR_clust'])
FMS = metrics.fowlkes_mallows_score(sub_adata.obs['layer_guess'], sub_adata.obs['HiSTaR_clust'])
print("ARI:", ARI , "NMI:", NMI, "FMS:", FMS)

ARI: 0.7181548924986987 NMI: 0.753457100987869 FMS: 0.7735671814491102

In [13]:

fig3, ax3 = plt.subplots(1, 2, figsize=(10, 4))
sc.pl.spatial(
    adata,
    color="layer_guess",
    title="Manual Annotation",
    palette=mclust_palette,
    ax=ax3[0],
    spot_size=100,
    frameon=False,
    img_key=None,  # 禁用原始切片
    show=False,
    s=70
)
sc.pl.spatial(
    adata,
    color="HiSTaR_clust",
    title=f"HiSTaR (ARI={ARI:.4f})",
    palette=mclust_palette,
    ax=ax3[1],
    spot_size=100,
    frameon=False,
    img_key=None,  # 禁用原始切片
    show=False,
    s=70
)
fig3.tight_layout()
plt.show()

No description has been provided for this image

UMAP¶

In [14]:

sc.pp.neighbors(adata, use_rep='HiSTaR', metric='cosine')
sc.tl.umap(adata)

In [15]:

sc.pl.umap(
    adata,
    color="HiSTaR_clust",
    title=f"HiSTaR (ARI={ARI:.4f})",
    palette=mclust_palette[:7],  # 使用前7个颜色
    frameon=False,
    show=False
)
plt.show()

PAGA¶

In [16]:

clusters = sorted(adata.obs['HiSTaR_clust'].unique())  
 
color_dict = {cluster: mclust_palette[i] for i, cluster in enumerate(clusters)}  

sc.tl.paga(adata, groups='HiSTaR_clust')  
  
mean_positions = np.zeros((len(clusters), 2))  
for i, cluster in enumerate(clusters):
    mask = adata.obs['HiSTaR_clust'] == cluster  
    mean_positions[i] = np.mean(adata.obsm['X_umap'][mask], axis=0)  

fig, ax = plt.subplots(figsize=(10, 8), facecolor='none', edgecolor='none')  
ax.set_facecolor('none')  

sc.pl.paga(  
    adata,  
    color='HiSTaR_clust',  
    show=False,  
    frameon=False,  
    edge_width_scale=2,  
    node_size_scale=5,  
    pos=mean_positions,  
    ax=ax,  
)  

# plt.savefig('paga_plot.png',   
#             transparent=True,   
#             bbox_inches='tight',   
#             dpi=300)  

plt.show()  

# 打印调试信息  
print("Cluster order:", clusters)  
print("Color mapping:", color_dict)

Cluster order: [1, 2, 3, 4, 5, 6, 7]
Color mapping: {1: '#F3766E', 2: '#5BB300', 3: '#2E96FF', 4: '#C655D9', 5: '#FFB549', 6: '#00C6EA', 7: '#9B8500'}