Mouse Breast Brain Anterior-Posterior batch effect correction
9: Multi-sample integration task for mouse brain data anterior-posterior sections¶
HiSTaR can process Correct the batch effect. In this tutorial, we use Mouse Brain dataset to introduce the analysis.
mouse brain data comes from STGMVA https://zenodo.org/records/8141084
The complete experimental dataset is available here https://zenodo.org/records/15599070
Loading package¶
In [1]:
import scanpy as sc
import torch
import squidpy as sq
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
torch.set_default_dtype(torch.float64)
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
D:\anaonda3\envs\ST_pytorch\Lib\site-packages\dask\dataframe\__init__.py:31: FutureWarning: The legacy Dask DataFrame implementation is deprecated and will be removed in a future version. Set the configuration option `dataframe.query-planning` to `True` or None to enable the new Dask Dataframe implementation and silence this warning. warnings.warn( D:\anaonda3\envs\ST_pytorch\Lib\site-packages\anndata\utils.py:429: FutureWarning: Importing read_text from `anndata` is deprecated. Import anndata.io.read_text instead. warnings.warn(msg, FutureWarning)
In [2]:
import HiSTaR
random_seed = 2023
HiSTaR.fix_seed(random_seed)
In [3]:
import matplotlib as mpl
mpl.rcParams.update({
'font.family': 'Arial',
'axes.labelweight': 'bold',
'axes.titleweight': 'bold',
'axes.titlesize': 12,
'axes.titlelocation': 'left',
'figure.constrained_layout.use': True,
'figure.dpi': 300,
'savefig.dpi': 300,
})
mclust_palette = [
"#F3766E", "#5BB300", "#2E96FF", "#C655D9", "#FFB549",
"#00C6EA", "#9B8500", "#009E73", "#FF6EB4", "#7AFF33",
"#A837D8", "#FFD700", "#00FF99", "#FF7F50", "#8A2BE2",
"#4B0082", "#FF1493", "#32CD32", "#8B4513", "#6D4C41",
"#FF80ED", "#00CED1", "#BA55D3", "#FF4500", "#ADFF2F",
"#2E8B57", "#DA70D6", "#FFA500", "#800080", "#40E0D0"
]
Reading ST data¶
In [4]:
adata = sc.read_h5ad("../data/mouse_brain_anterior_posterior_sections.h5ad")
adata.var_names_make_unique()
In [5]:
adata
Out[5]:
AnnData object with n_obs × n_vars = 6114 × 32285
obs: 'in_tissue', 'array_row', 'array_col', 'data'
obsm: 'spatial'
In [6]:
plt.rcParams["figure.facecolor"] = "white"
plt.rcParams["axes.facecolor"] = "white"
plt.rcParams["figure.figsize"] = (10, 5)
sc.pl.embedding(
adata,
basis='spatial',
size=35,
color='data',
frameon=False,
show=False,
cmap='viridis'
)
ax = plt.gca()
ax.grid(False)
ax.spines['bottom'].set_color('black')
ax.spines['left'].set_color('black')
# plt.savefig("aligned.png", dpi=300, bbox_inches='tight', facecolor='white')
In [7]:
adata.layers['count'] = adata.X.toarray()
sc.pp.filter_genes(adata, min_cells=50)
sc.pp.filter_genes(adata, min_counts=10)
sc.pp.normalize_total(adata, target_sum=1e6)
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", layer='count', n_top_genes=2000)
adata = adata[:, adata.var['highly_variable'] == True]
sc.pp.scale(adata)
from sklearn.decomposition import PCA
adata_X = PCA(n_components=200, random_state=42).fit_transform(adata.X)
adata.obsm['X_pca'] = adata_X
Constructing the spatial network¶
In [8]:
graph_dict = HiSTaR.graph_construction(adata, 12)
Running HiSTaR¶
In [9]:
histar_net = HiSTaR.histar(adata.obsm['X_pca'], graph_dict, device=device, gcn_hidden2=12, lambda_sim=0.69)
histar_net.train()
histar_feat, _, _, _ = histar_net.process()
adata.obsm['HiSTaR'] = histar_feat
In [10]:
adata
Out[10]:
AnnData object with n_obs × n_vars = 6114 × 2000
obs: 'in_tissue', 'array_row', 'array_col', 'data'
var: 'n_cells', 'n_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'mean', 'std'
uns: 'data_colors', 'hvg'
obsm: 'spatial', 'X_pca', 'HiSTaR'
layers: 'count'
Clustering¶
In [11]:
# HiSTaR.configure_r_environment() # If you encounter problems loading R packages, you can manually configure your path in this function.
In [12]:
HiSTaR.mclust_R(adata, n_clusters=26, use_rep='HiSTaR', key_added='HiSTaR_clust')
R[write to console]: __ __
____ ___ _____/ /_ _______/ /_
/ __ `__ \/ ___/ / / / / ___/ __/
/ / / / / / /__/ / /_/ (__ ) /_
/_/ /_/ /_/\___/_/\__,_/____/\__/ version 6.1.1
Type 'citation("mclust")' for citing this R package in publications.
fitting ... |======================================================================| 100%
Out[12]:
AnnData object with n_obs × n_vars = 6114 × 2000
obs: 'in_tissue', 'array_row', 'array_col', 'data', 'HiSTaR_clust'
var: 'n_cells', 'n_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'mean', 'std'
uns: 'data_colors', 'hvg'
obsm: 'spatial', 'X_pca', 'HiSTaR'
layers: 'count'
Visualization¶
In [13]:
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
plt.rcParams["figure.facecolor"] = "white"
plt.rcParams["axes.facecolor"] = "white"
mclust_cmap = ListedColormap(mclust_palette)
sq.pl.spatial_scatter(adata, color="HiSTaR_clust", size=15, figsize=(8,6), shape=None, palette=mclust_cmap)
# plt.savefig("aligned_clusters.png", dpi=300, bbox_inches='tight', facecolor='white')
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`