Overview

This notebook describes how to construct GRN models in CellOracle. Please read our paper first to know about the CellOracle’s GRN modeling algorithm.

Last update: 4/13/2022, Kenji Kamimoto

Notebook file

Notebook file is available on CellOracle’s GitHub page. https://github.com/morris-lab/CellOracle/blob/master/docs/notebooks/04_Network_analysis/Network_analysis_with_Paul_etal_2015_data.ipynb

Data

CellOracle uses two types of input data during the GRN model construction.

  • Input data 1: scRNA-seq data. Please look at the previous section to learn about the scRNA-seq data preprocessing. https://morris-lab.github.io/CellOracle.documentation/tutorials/scrnaprocess.html

  • Input data 2: Base-GRN. The base GRN represents the TF-target gene connections. The data structure is a binary 2D matrix or linklist. Please look at our paper to know the concept of base GRN.

  • CellOracle typically uses a base GRN constructed from scATAC-seq data. If you would like to create a custom base GRN using your scATAC-seq data, please refer to the following notebook. https://morris-lab.github.io/CellOracle.documentation/tutorials/base_grn.html

  • If you do not have any sample-specific scATAC-seq data that correspond to the same/similar cell type of the scRNA-seq data, please use pre-built base GRN.
  • We provide multiple options for pre-built base GRN. For analyses in mice, we recommend using a base GRN built from the mouse sciATAC-seq Atlas dataset. It includes various tissues and various cell types. Another option is using base GRN constructed from promoter database. We provide promoter base GRNs for ten species.

What you can do

After constructing the CellOracle GRN models, you can do two analyses.

  1. in silico TF perturbation analysis. CellOracle uses the GRN models to simulate cell identity shifts in response to TF perturbation. For this analysis, you must first construct your GRN models in this notebook.

  2. Network analysis. You can analyze the GRN models using graph theory. We provide several functions.

  • CellOracle construct cluster-wise GRN models. You can compare the GRN model structure between clusters, which allows you to investigate the cell type-specific GRN configurations and explore structual changes as GRNs are rewired along the cell differentiation trajectory.

  • You can also export the network models and analyze the GRN models using external packages or softwares.

Custom data classes / objects

In this notebook, CellOracle uses two custom classes, Oracle and Links.

  • Oracle is the main class in the CellOracle package. It is responsible for almost all the GRN model construction and TF perturbation simulation. Oracle does the following calculations sequentially.

  1. Import scRNA-sequence data. Please refer to the data preparation notebooks to learn about data preparation procedures.

  2. Import base GRN data.

  3. scRNA-seq data processing.

  4. GRN model construction.

  5. in silico petrurbation. We will describe how to do this in the following notebook.

  • Links is a class to store GRN data. It also conteins many functions for network analysis and visualization.

0. Import libraries

[1]:
# 0. Import

import os
import sys

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scanpy as sc
import seaborn as sns

[5]:
import celloracle as co
co.__version__
[5]:
'0.10.14'
[6]:
# visualization settings
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

plt.rcParams['figure.figsize'] = [6, 4.5]
plt.rcParams["savefig.dpi"] = 300
[7]:
save_folder = "figures"
os.makedirs(save_folder, exist_ok=True)

1. Load data

Please refer to the previous notebook to learn about scRNA-seq data processing. https://morris-lab.github.io/CellOracle.documentation/tutorials/scrnaprocess.html

The scRNA-seq data must be in the anndata format.

This CellOracle tutorial notebook assumes the user has basic knowledge and experience with scRNA-seq analysis using Scanpy and Anndata. This notebook is not intended to be an introduction to Scanpy or Anndata. If you are not familiar with them, please learn them using the documentation and tutorials for Scanpy (https://scanpy.readthedocs.io/en/stable/) and annata (https://anndata.readthedocs.io/en/stable/).

[6]:
# Load data.
# Here, we will use a hematopoiesis data by Paul 2015.
# You can load preprocessed data using a celloracle function as follows.
adata = co.data.load_Paul2015_data()
adata

# Attention: Replace the data path below when using your data.
# adata = sc.read_h5ad("DATAPATH")
[6]:
AnnData object with n_obs × n_vars = 2671 × 1999
    obs: 'paul15_clusters', 'n_counts_all', 'n_counts', 'louvain', 'cell_type', 'louvain_annot', 'dpt_pseudotime'
    var: 'n_counts'
    uns: 'cell_type_colors', 'diffmap_evals', 'draw_graph', 'iroot', 'louvain', 'louvain_annot_colors', 'louvain_colors', 'louvain_sizes', 'neighbors', 'paga', 'paul15_clusters_colors', 'pca'
    obsm: 'X_diffmap', 'X_draw_graph_fa', 'X_pca'
    varm: 'PCs'
    layers: 'raw_count'
    obsp: 'connectivities', 'distances'

If your scRNA-seq data includes more than 20-30K cells, we recommend downsampling your data. If you do not, please note the perturbation simulations may require large amounts of memory in the next notebook.

Also, please pay attention to the number of genes. If you are following the instruction from the previous tutorial notebook, the scRNA-seq data should include only top 2~3K variable genes. If you have more than 3K genes, it may cause issues downstream.

[7]:
print(f"Cell number is :{adata.shape[0]}")
print(f"Gene number is :{adata.shape[1]}")
Cell number is :2671
Gene number is :1999
[8]:
# Random downsampling into 30K cells if the anndata object include more than 30 K cells.
n_cells_downsample = 30000
if adata.shape[0] > n_cells_downsample:
    # Let's dowmsample into 30K cells
    sc.pp.subsample(adata, n_obs=n_cells_downsample, random_state=123)
[9]:
print(f"Cell number is :{adata.shape[0]}")
Cell number is :2671

To infer cluster-specific GRNs, CellOracle requires a base GRN. - There are several ways to make a base GRN. We can typically generate base GRN from scATAC-seq data or bulk ATAC-seq data. Please refer to the first step of the tutorial to learn more about this process. https://morris-lab.github.io/CellOracle.documentation/tutorials/base_grn.html

  • If you do not have your scATAC-seq data, you can use one of the built-in base GRNs options below.

  • Base GRNs made from mouse sci-ATAC-seq atlas : The built-in base GRN was made from various tissue/cell-types found in the atlas (http://atlas.gs.washington.edu/mouse-atac/). We recommend using this for mouse scRNA-seq data. Please load this data as follows.

base_GRN = co.data.load_mouse_scATAC_atlas_base_GRN()

  • Promoter base GRN: We also provide base GRN made from promoter DNA-sequences for ten species. You can load this data as follows.

  • e.g. for Human: base_GRN = co.data.load_human_promoter_base_GRN()

[10]:
# Load TF info which was made from mouse cell atlas dataset.
base_GRN = co.data.load_mouse_scATAC_atlas_base_GRN()

# Check data
base_GRN.head()
[10]:
peak_id gene_short_name 9430076c15rik Ac002126.6 Ac012531.1 Ac226150.2 Afp Ahr Ahrr Aire ... Znf784 Znf8 Znf816 Znf85 Zscan10 Zscan16 Zscan22 Zscan26 Zscan31 Zscan4
0 chr10_100050979_100052296 4930430F08Rik 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 chr10_101006922_101007748 SNORA17 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
2 chr10_101144061_101145000 Mgat4c 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
3 chr10_10148873_10149183 9130014G24Rik 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 chr10_10149425_10149815 9130014G24Rik 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 1095 columns

2. Make Oracle object

We will use an Oracle object during the data preprocessing and GRN inference steps. Using its internal functions, the Oracle object computes and stores all of the necessary information for these calculations. To begin, we will instantiate a new Oracle object and input our gene expression data (anndata) and TF info (base GRN).

[11]:
# Instantiate Oracle object
oracle = co.Oracle()

For the CellOracle analysis, your anndata should include (1) gene expression counts, (2) clustering information, and (3) trajectory (dimensional reduction embedding) data. Please refer to a previous notebook for more information on anndata preprocessing.

When you load the scRNA-seq data, please enter the name of the clustering data and the name of the dimensionality reduction. - The clustering data should be to be stored in the obs the attribute of anndata. > You can check it using the following command. > > adata.obs.columns

  • Dimensional reduction data is stored in the obsm the attribute of anndata. > You can check it with the following command. > > adata.obsm.keys()

[12]:
# Check data in anndata
print("Metadata columns :", list(adata.obs.columns))
print("Dimensional reduction: ", list(adata.obsm.keys()))
metadata columns : ['paul15_clusters', 'n_counts_all', 'n_counts', 'louvain', 'cell_type', 'louvain_annot', 'dpt_pseudotime']
dimensional reduction:  ['X_diffmap', 'X_draw_graph_fa', 'X_pca']
[13]:
# In this notebook, we use the unscaled mRNA count for the nput of Oracle object.
adata.X = adata.layers["raw_count"].copy()

# Instantiate Oracle object.
oracle.import_anndata_as_raw_count(adata=adata,
                                   cluster_column_name="louvain_annot",
                                   embedding_name="X_draw_graph_fa")
[14]:
# You can load TF info dataframe with the following code.
oracle.import_TF_data(TF_info_matrix=base_GRN)

# Alternatively, if you saved the informmation as a dictionary, you can use the code below.
# oracle.import_TF_data(TFdict=TFinfo_dictionary)

We can add additional TF-target gene pairs manually.

For example, if there is a study or database that includes specific TF-target pairs, you can include this information using a python dictionary.

2.3.1. Make dictionary

Here, we will introduce how to manually add TF-target gene pair data.

As an example, we will use TF binding data from supplemental table 4 in the Paul et al., (2015) paper. (http://doi.org/10.1016/j.cell.2015.11.013).

You can download this file by running the command below. Or download it manually here. https://raw.githubusercontent.com/morris-lab/CellOracle/master/docs/demo_data/TF_data_in_Paul15.csv

In order to import the TF data into the Oracle object, we need to convert it into a python dictionary. The dictionary keys will be the target gene, and dictionary values will be a list of regulatory candidate TFs.

[15]:
# Download file.
!wget https://raw.githubusercontent.com/morris-lab/CellOracle/master/docs/demo_data/TF_data_in_Paul15.csv

# If you are using macOS, please try the following command.
#!curl -O https://raw.githubusercontent.com/morris-lab/CellOracle/master/docs/demo_data/TF_data_in_Paul15.csv
--2022-05-07 18:28:33--  https://raw.githubusercontent.com/morris-lab/CellOracle/master/docs/demo_data/TF_data_in_Paul15.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1771 (1.7K) [text/plain]
Saving to: ‘TF_data_in_Paul15.csv.2’

TF_data_in_Paul15.c 100%[===================>]   1.73K  --.-KB/s    in 0s

2022-05-07 18:28:33 (38.3 MB/s) - ‘TF_data_in_Paul15.csv.2’ saved [1771/1771]

[16]:
# Load the TF and target gene information from Paul et al. (2015).
Paul_15_data = pd.read_csv("TF_data_in_Paul15.csv")
Paul_15_data

[16]:
TF Target_genes
0 Cebpa Abcb1b, Acot1, C3, Cnpy3, Dhrs7, Dtx4, Edem2, ...
1 Irf8 Abcd1, Aif1, BC017643, Cbl, Ccdc109b, Ccl6, d6...
2 Irf8 1100001G20Rik, 4732418C07Rik, 9230105E10Rik, A...
3 Klf1 2010011I20Rik, 5730469M10Rik, Acsl6, Add2, Ank...
4 Spi1 0910001L09Rik, 2310014H01Rik, 4632428N05Rik, A...
[17]:
# Make dictionary: dictionary key is TF and dictionary value is list of target genes.
TF_to_TG_dictionary = {}

for TF, TGs in zip(Paul_15_data.TF, Paul_15_data.Target_genes):
    # convert target gene to list
    TG_list = TGs.replace(" ", "").split(",")
    # store target gene list in a dictionary
    TF_to_TG_dictionary[TF] = TG_list

# We invert the dictionary above using a utility function in celloracle.
TG_to_TF_dictionary = co.utility.inverse_dictionary(TF_to_TG_dictionary)

2.3.2. Add TF information dictionary into the oracle object

[18]:
# Add TF information
oracle.addTFinfo_dictionary(TG_to_TF_dictionary)

3. KNN imputation

CellOracle uses the same strategy as velocyto for visualizing cell transitions. This process requires KNN imputation in advance.

For the KNN imputation, we first need to calculate and select PCs.

[19]:
# Perform PCA
oracle.perform_PCA()

# Select important PCs
plt.plot(np.cumsum(oracle.pca.explained_variance_ratio_)[:100])
n_comps = np.where(np.diff(np.diff(np.cumsum(oracle.pca.explained_variance_ratio_))>0.002))[0][0]
plt.axvline(n_comps, c="k")
plt.show()
print(n_comps)
n_comps = min(n_comps, 50)
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_31_0.png

Estimate the optimal number of nearest neighbors for KNN imputation.

[20]:
n_cell = oracle.adata.shape[0]
print(f"cell number is :{n_cell}")
cell number is :2671
[21]:
k = int(0.025*n_cell)
print(f"Auto-selected k is :{k}")
Auto-selected k is :66
[22]:
oracle.knn_imputation(n_pca_dims=n_comps, k=k, balanced=True, b_sight=k*8,
                      b_maxl=k*4, n_jobs=4)

4. Save and Load.

You can save your Oracle object using Oracle.to_hdf5(FILE_NAME.celloracle.oracle).

Pleasae use co.load_hdf5(FILE_NAME.celloracle.oracle) to load the saved file.

[23]:
# Save oracle object.
oracle.to_hdf5("Paul_15_data.celloracle.oracle")
[24]:
# Load file.
oracle = co.load_hdf5("Paul_15_data.celloracle.oracle")

5. GRN calculation

The next step constructs a cluster-specific GRN for all clusters.

  • You can calculate GRNs with the get_links function, and it will return the results as a Links object. The Links object stores the inferred GRNs and the corresponding metadata. Most network structure analysis is performed with the Links object.

  • A GRN will be calculated for each cluster/sub-group. In the example below, we construct GRN for each unit of the “louvain_annot” clustering.

[25]:
# Check clustering data
sc.pl.draw_graph(oracle.adata, color="louvain_annot")
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_41_0.png
[26]:
%%time
# Calculate GRN for each population in "louvain_annot" clustering unit.
# This step may take some time.(~30 minutes)
links = oracle.get_links(cluster_name_for_GRN_unit="louvain_annot", alpha=10,
                         verbose_level=10)

Although CellOracle has many functions for network analysis, you can export and analyze GRNs using another software if you chose. The raw GRN data is stored as a dictionary of dataframe in the links_dict attribute.

For example, you can get the GRN for the “Ery_0” cluster with the following commands.

[27]:
links.links_dict.keys()
[27]:
dict_keys(['Ery_0', 'Ery_1', 'Ery_2', 'Ery_3', 'Ery_4', 'Ery_5', 'Ery_6', 'Ery_7', 'Ery_8', 'Ery_9', 'GMP_0', 'GMP_1', 'GMP_2', 'GMPl_0', 'GMPl_1', 'Gran_0', 'Gran_1', 'Gran_2', 'Gran_3', 'MEP_0', 'Mk_0', 'Mo_0', 'Mo_1', 'Mo_2'])
[28]:
links.links_dict["Ery_0"]
[28]:
source target coef_mean coef_abs p -logp
0 Elf1 0610007L01Rik 0.003746 0.003746 1.440512e-03 2.841483
1 Mef2c 0610007L01Rik -0.007328 0.007328 1.365241e-04 3.864791
2 Chd2 0610007L01Rik -0.010170 0.010170 4.160059e-06 5.380900
3 Irf7 0610007L01Rik -0.009342 0.009342 2.462105e-06 5.608693
4 Stat3 0610007L01Rik -0.008464 0.008464 1.636075e-04 3.786197
... ... ... ... ... ... ...
74932 Nfkb1 Zyx 0.015513 0.015513 6.787753e-06 5.168274
74933 Zbtb4 Zyx 0.000247 0.000247 8.890848e-01 0.051057
74934 Klf4 Zyx -0.005566 0.005566 9.731835e-08 7.011805
74935 Nfia Zyx -0.026013 0.026013 2.981702e-10 9.525536
74936 Rel Zyx -0.002867 0.002867 1.196048e-01 0.922251

74937 rows × 6 columns

You can export the file as follows.

[29]:
# Set cluster name
cluster = "Ery_0"

# Save as csv
#links.links_dict[cluster].to_csv(f"raw_GRN_for_{cluster}.csv")

The links object stores color information in the palette attribute. This information is used when visualizing the clusters.

The sample will be visualized in that order. Here we can change both the cluster colors and order.

[30]:
# Show the contents of pallete
links.palette
[30]:
palette
Ery_0 #9CDED6
Ery_1 #D5EAE7
Ery_2 #F3E1EB
Ery_3 #F6C4E1
Ery_4 #F79CD4
Ery_5 #E6AFB9
Ery_6 #E07B91
Ery_7 #D33F6A
Ery_8 #BB7784
Ery_9 #8E063B
GMP_0 #11C638
GMP_1 #8DD593
GMP_2 #C6DEC7
GMPl_0 #B5BBE3
GMPl_1 #7D87B9
Gran_0 #1CE6FF
Gran_1 #8595E1
Gran_2 #4A6FE3
Gran_3 #023FA5
MEP_0 #0FCFC0
Mk_0 #C7C7C7
Mo_0 #EAD3C6
Mo_1 #F0B98D
Mo_2 #EF9708
[31]:
# Change the order of pallete
order = ['MEP_0', 'Mk_0', 'Ery_0',
         'Ery_1', 'Ery_2', 'Ery_3', 'Ery_4', 'Ery_5', 'Ery_6', 'Ery_7', 'Ery_8', 'Ery_9',
         'GMP_0', 'GMP_1', 'GMP_2', 'GMPl_0', 'GMPl_1',
         'Mo_0', 'Mo_1', 'Mo_2',
         'Gran_0', 'Gran_1', 'Gran_2', 'Gran_3']
links.palette = links.palette.loc[order]
links.palette
[31]:
palette
MEP_0 #0FCFC0
Mk_0 #C7C7C7
Ery_0 #9CDED6
Ery_1 #D5EAE7
Ery_2 #F3E1EB
Ery_3 #F6C4E1
Ery_4 #F79CD4
Ery_5 #E6AFB9
Ery_6 #E07B91
Ery_7 #D33F6A
Ery_8 #BB7784
Ery_9 #8E063B
GMP_0 #11C638
GMP_1 #8DD593
GMP_2 #C6DEC7
GMPl_0 #B5BBE3
GMPl_1 #7D87B9
Mo_0 #EAD3C6
Mo_1 #F0B98D
Mo_2 #EF9708
Gran_0 #1CE6FF
Gran_1 #8595E1
Gran_2 #4A6FE3
Gran_3 #023FA5
[32]:
# Save Links object.
links.to_hdf5(file_path="links.celloracle.links")

6. Network preprocessing

Using the base GRN, CellOracle constructs the GRN models as a lits of directed edges between a TF and its target genes. We need to remove the weak edges or insignificant edges before doing network structure analysis.

We filter the network edges as follows.

  1. Remove uncertain network edges based on the p-value.

  2. Remove weak network edge. In this tutorial, we keep the top 2000 edges ranked by edge strength.

The raw network data is stored in the links_dict attribute, while the filtered network data is stored in the filtered_links attribute.

[33]:
links.filter_links(p=0.001, weight="coef_abs", threshold_number=2000)

In the first step, we examine the network degree distribution.

Network degree, which is the number of edges for each node, is one of the important metrics used to investigate the network structure (https://en.wikipedia.org/wiki/Degree_distribution).

Please keep in mind that the degree distribution may change depending on the filtering threshold.

[34]:
plt.rcParams["figure.figsize"] = [9, 4.5]
[35]:
links.plot_degree_distributions(plot_model=True,
                                               #save=f"{save_folder}/degree_distribution/",
                                               )
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_0.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_1.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_2.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_3.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_4.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_5.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_6.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_7.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_8.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_9.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_10.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_11.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_12.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_13.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_14.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_15.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_16.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_17.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_18.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_19.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_20.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_21.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_22.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_58_23.png
[36]:
plt.rcParams["figure.figsize"] = [6, 4.5]

Next, we calculate several network scores.

Although previous version of celloracle, links.get_score() required R packages for the netowrk score calculation, the function was replaced with new function, links.get_network_scores(), which does NOT require any R package. This new function can be available with celloracle >= 0.10.0.

The old function, links.get_score() is still available, but it will be removed in the future version.

[37]:
# Calculate network scores.
links.get_network_score()

The score is stored as a attribute merged_score.

[38]:
links.merged_score.head()
[38]:
degree_all degree_centrality_all degree_in degree_centrality_in degree_out degree_centrality_out betweenness_centrality eigenvector_centrality cluster
0610010K14Rik 3 0.005435 3 0.005435 0 0.0 0.0 0.026116 Ery_0
1300001I01Rik 2 0.003623 2 0.003623 0 0.0 0.0 0.054690 Ery_0
1500001M20Rik 1 0.001812 1 0.001812 0 0.0 0.0 0.012897 Ery_0
1500012F01Rik 9 0.016304 9 0.016304 0 0.0 0.0 0.128980 Ery_0
1700017B05Rik 2 0.003623 2 0.003623 0 0.0 0.0 0.006488 Ery_0

Save processed GRNs. We will use this file during the in in silico TF perturbation analysis.

[39]:
# Save Links object.
links.to_hdf5(file_path="links.celloracle.links")
[40]:
# You can load files with the following command.
links = co.load_hdf5(file_path="links.celloracle.links")

If you are not interested in network analysis you can skip the steps described below. Please go to the next step: in silico gene perturbation with GRNs

https://morris-lab.github.io/CellOracle.documentation/tutorials/simulation.html

7. Network analysis; Network score for each gene

The Links class has many functions to visualize network score. See the web documentation to learn more about these functions.

We have calculated several network scores using different centrality metrics. >Centrality matrics are is one of the important indicators of network structure (https://en.wikipedia.org/wiki/Centrality).

Let’s visualize genes with high network centrality.

[41]:
# Check cluster name
links.cluster
[41]:
['Ery_0',
 'Ery_1',
 'Ery_2',
 'Ery_3',
 'Ery_4',
 'Ery_5',
 'Ery_6',
 'Ery_7',
 'Ery_8',
 'Ery_9',
 'GMP_0',
 'GMP_1',
 'GMP_2',
 'GMPl_0',
 'GMPl_1',
 'Gran_0',
 'Gran_1',
 'Gran_2',
 'Gran_3',
 'MEP_0',
 'Mk_0',
 'Mo_0',
 'Mo_1',
 'Mo_2']
[42]:
# Visualize top n-th genes with high scores.
links.plot_scores_as_rank(cluster="MEP_0", n_gene=30, save=f"{save_folder}/ranked_score")
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_71_0.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_71_1.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_71_2.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_71_3.png
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_71_4.png

By comparing network scores between two clusters, we can analyze differences in GRN structure.

[43]:
# Compare GRN score between two clusters
links.plot_score_comparison_2D(value="eigenvector_centrality",
                               cluster1="MEP_0", cluster2="GMPl_0",
                               percentile=98,
                               save=f"{save_folder}/score_comparison")
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_74_0.png
[44]:
# Compare GRN score between two clusters
links.plot_score_comparison_2D(value="betweenness_centrality",
                               cluster1="MEP_0", cluster2="GMPl_0",
                               percentile=98,
                               save=f"{save_folder}/score_comparison")
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_75_0.png
[45]:
# Compare GRN score between two clusters
links.plot_score_comparison_2D(value="degree_centrality_all",
                               cluster1="MEP_0", cluster2="GMPl_0",
                               percentile=98, save=f"{save_folder}/score_comparison")
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_76_0.png

In the following section, we focus on how a gene’s network score changes during the differentiation.

We will introduce how to visualize networks scores dynamics using Gata2 as an example.

Gata2 is known to play an essential role in the early MEP and GMP populations. .

[46]:
# Visualize Gata2 network score dynamics
links.plot_score_per_cluster(goi="Gata2", save=f"{save_folder}/network_score_per_gene/")
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_78_0.png

If a gene does not have network edge in a cluster, the network scores cannot be calculated and the scores will not be shown. For example, Cebpa have no network edge in the erythloids cluster GRNs. Thus these clusters do not have centrality scores in the plot below.

[47]:
# Visualize Cebpa network score dynamics
links.plot_score_per_cluster(goi="Cebpa")
../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_80_0.png

You can check the filtered network edge as follows.

[48]:
cluster_name = "Ery_1"
filtered_links_df = links.filtered_links[cluster_name]
filtered_links_df.head()
[48]:
source target coef_mean coef_abs p -logp
5506 Gata2 Apoe 0.109033 0.109033 1.107226e-11 10.955764
56740 Gata2 Rhd -0.096525 0.096525 4.285978e-16 15.367950
5519 Nfatc3 Apoe -0.095612 0.095612 1.146930e-16 15.940463
69218 Elf1 Top2a 0.093966 0.093966 3.595857e-08 7.444198
5550 Zfhx3 Apoe 0.093560 0.093560 1.515266e-15 14.819511

You can confirm that there is no Cebpa connection in Ery_0 cluster.

[49]:
filtered_links_df[filtered_links_df.source == "Cebpa"]
[49]:
source target coef_mean coef_abs p -logp
45587 Cebpa Nrgn 0.023446 0.023446 2.765830e-13 12.558175

8. Network analysis; network score distribution

Next, we visualize the network score distributions to get insight into the global network trends.

[50]:
plt.rcParams["figure.figsize"] = [6, 4.5]
[51]:
# Plot degree_centrality
plt.subplots_adjust(left=0.15, bottom=0.3)
plt.ylim([0,0.040])
links.plot_score_discributions(values=["degree_centrality_all"],
                               method="boxplot",
                               save=f"{save_folder}",
                              )


../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_88_0.png
[52]:
# Plot eigenvector_centrality
plt.subplots_adjust(left=0.15, bottom=0.3)
plt.ylim([0, 0.28])
links.plot_score_discributions(values=["eigenvector_centrality"],
                               method="boxplot",
                               save=f"{save_folder}")



../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_89_0.png
[53]:
plt.subplots_adjust(left=0.15, bottom=0.3)
links.plot_network_entropy_distributions(save=f"{save_folder}")


../../_images/notebooks_04_Network_analysis_Network_analysis_with_Paul_etal_2015_data_91_0.png

Please go to the next step: in silico gene perturbation with GRNs

https://morris-lab.github.io/CellOracle.documentation/tutorials/simulation.html

[ ]: