celloracle.network_analysis module

The network_analysis module implements Network analysis.

Bases: object

This is a class for the processing and visualization of GRNs. Links object stores cluster-specific GRNs and metadata. Please use “get_links” function in Oracle object to generate Links object.

Dictionary that store unprocessed network data.

Type

dictionary

Dictionary that store filtered network data.

Type

dictionary

merged_score

Network scores.

Type

pandas.dataframe

cluster

List of cluster name.

Type

list of str

name

Name of clustering unit.

Type

str

palette

DataFrame that store color information.

Type

pandas.dataframe

Filter network edges. In most cases, inferred GRN has non-significant random edges. We have to remove these edges before analyzing the network structure. You can do the filtering in any of the following ways.

  1. Filter based on the p-value of the network edge. Please enter p-value for thresholding.

  2. Filter based on network edge number. If you set the number, network edges will be filtered based on the order of a network score. The top n-th network edges with network weight will remain, and the other edges will be removed. The network data has several types of network weight, so you have to select which network weight do you want to use.

  3. Filter based on an arbitrary gene list. You can set a gene list for source nodes or target nodes.

Parameters
  • p (float) – threshold for p-value of the network edge.

  • weight (str) – Please select network weight name for the filtering

  • genelist_source (list of str) – gene list to remain in regulatory gene nodes. Default is None.

  • genelist_target (list of str) – gene list to remain in target gene nodes. Default is None.

get_network_entropy(value='coef_abs')

Calculate network entropy scores.

Parameters

value (str) – Default is “coef_abs”.

get_network_score()

Get several network sores using igraph library. The following scores are calculated: [‘degree_all’, ‘degree_centrality_all’, ‘degree_in’,

‘degree_centrality_in’, ‘degree_out’, ‘degree_centrality_out’, ‘betweenness_centrality’, ‘eigenvector_centrality’]

get_score(test_mode=False, n_jobs=- 1)

Get several network sores using R-igraph, linkcomm, and rnetcarto. This require R packages.

plot_cartography_scatter_per_cluster(gois=None, clusters=None, scatter=True, kde=False, auto_gene_annot=False, percentile=98, args_dot={'n_levels': 105}, args_line={'c': 'gray'}, args_annot={}, save=None)

Make a gene network cartography plot. Please read the original paper describing gene network cartography for more information. https://www.nature.com/articles/nature03288

Parameters
  • links (Links) – See network_analysis.Links class for detail.

  • gois (list of srt) – List of gene name to highlight.

  • clusters (list of str) – List of cluster name to analyze. If None, all clusters in Links object will be analyzed.

  • scatter (bool) – Whether to make a scatter plot.

  • auto_gene_annot (bool) – Whether to pick up genes to make an annotation.

  • percentile (float) – Genes with a network score above the percentile will be shown with annotation. Default is 98.

  • args_dot (dictionary) – Arguments for scatter plot.

  • args_line (dictionary) – Arguments for lines in cartography plot.

  • args_annot (dictionary) – Arguments for annotation in plots.

  • save (str) – Folder path to save plots. If the folder does not exist in the path, the function creates the folder. Plots will not be saved if [save=None]. Default is None.

plot_cartography_term(goi, save=None, plt_show=True)

Plot the gene network cartography term like a heatmap. Please read the original paper of gene network cartography for the principle of gene network cartography. https://www.nature.com/articles/nature03288

Parameters
  • links (Links) – See network_analysis.Links class for detail.

  • gois (list of srt) – List of gene name to highlight.

  • save (str) – Folder path to save plots. If the folder does not exist in the path, the function creates the folder. Plots will not be saved if [save=None]. Default is None.

plot_degree_distributions(plot_model=False, save=None)

Plot the network degree distributions (the number of edge per gene). The network degree will be visualized in both linear scale and log scale.

Parameters
  • links (Links) – See network_analysis.Links class for detail.

  • plot_model (bool) – Whether to plot linear approximation line.

  • save (str) – Folder path to save plots. If the folder does not exist in the path, the function creates the folder. Plots will not be saved if [save=None]. Default is None.

plot_network_entropy_distributions(update_network_entropy=False, save=None)

Plot the distribution for network entropy. See the CellOracle paper for more detail.

Parameters
  • links (Links object) – See network_analysis.Links class for detail.

  • values (list of str) – The list of score to visualize. If it is None, all network score (listed above) will be used.

  • update_network_entropy (bool) – Whether to recalculate network entropy.

  • save (str) – Folder path to save plots. If the folder does not exist in the path, the function creates the folder. Plots will not be saved if [save=None]. Default is None.

plot_score_comparison_2D(value, cluster1, cluster2, percentile=99, annot_shifts=None, save=None, plt_show=True, interactive=False)

Make a scatter plot that compares specific network scores in two groups.

Parameters
  • links (Links) – See network_analysis.Links class for detail.

  • value (srt) – The network score type.

  • cluster1 (str) – Cluster name. Network scores in cluster1 will be visualized in the x-axis.

  • cluster2 (str) – Cluster name. Network scores in cluster2 will be visualized in the y-axis.

  • percentile (float) – Genes with a network score above the percentile will be shown with annotation. Default is 99.

  • annot_shifts ((float, float)) – Annotation visualization setting.

  • save (str) – Folder path to save plots. If the folder does not exist in the path, the function creates the folder. Plots will not be saved if [save=None]. Default is None.

plot_score_discributions(values=None, method='boxplot', save=None)

Plot the distribution of network scores. An individual data point is a network edge (gene).

Parameters
  • links (Links) – See Links class for details.

  • values (list of str) – The list of score to visualize. If it is None, all of the network score will be used.

  • method (str) – Plotting method. Select either “boxplot” or “barplot”.

  • save (str) – Folder path to save plots. If the folder does not exist in the path, the function creates the folder. Plots will not be saved if [save=None]. Default is None.

plot_score_per_cluster(goi, save=None, plt_show=True)

Plot network score for a gene. This function visualizes the network score for a specific gene between clusters to get an insight into the dynamics of the gene.

Parameters
  • links (Links) – See network_analysis.Links class for detail.

  • goi (srt) – Gene name.

  • save (str) – Folder path to save plots. If the folder does not exist in the path, the function creates the folder. Plots will not be saved if [save=None]. Default is None.

plot_scores_as_rank(cluster, n_gene=50, save=None)

Pick up top n-th genes wich high-network scores and make plots.

Parameters
  • links (Links) – See network_analysis.Links class for detail.

  • cluster (str) – Cluster name to analyze.

  • n_gene (int) – Number of genes to plot. Default is 50.

  • save (str) – Folder path to save plots. If the folder does not exist in the path, the function creates the folder. Plots will not be saved if [save=None]. Default is None.

to_hdf5(file_path)

Save object as hdf5.

Parameters

file_path (str) – file path to save file. Filename needs to end with ‘.celloracle.links’

celloracle.network_analysis.draw_network(linkList, return_graph=False)

Plot network graph.

Parameters
  • linkList (pandas.DataFrame) – GRN saved as linkList.

  • return_graph (bool) – Whether to return graph object.

Returns

Network X graph objenct.

Return type

Graph object

celloracle.network_analysis.get_R_path()

Make GRN for each cluster and returns results as a Links object. Several preprocessing should be done before using this function.

Parameters
  • oracle_object (Oracle) – See Oracle module for detail.

  • cluster_name_for_GRN_unit (str) – Cluster name for GRN calculation. The cluster information should be stored in Oracle.adata.obs.

  • alpha (float or int) – The strength of regularization. If you set a lower value, the sensitivity increases, and you can detect weaker network connections. However, there may be more noise. If you select a higher value, it will reduce the chance of overfitting.

  • bagging_number (int) – The number used in bagging calculation.

  • verbose_level (int) – if [verbose_level>1], most detailed progress information will be shown. if [1 >= verbose_level > 0], one progress bar will be shown. if [verbose_level == 0], no progress bar will be shown.

  • test_mode (bool) – If test_mode is True, GRN calculation will be done for only one cluster rather than all clusters.

  • model_method (str) – Chose modeling algorithm. “bagging_ridge” or “bayesian_ridge”

  • n_jobs (int) – Number of cpu cores for parallel calculation. -1 means using all available cores. Default is -1.

Convert linkList into Graph object in NetworkX.

Parameters

filteredlinkList (pandas.DataFrame) – GRN saved as linkList.

Returns

Network X graph objenct.

Return type

Graph object

Load links object saved as a hdf5 file.

Parameters

file_path (str) – file path.

Returns

loaded links object.

Return type

Links

celloracle.network_analysis.set_R_path(R_path)

Transfer the summary of network scores (median or mean) per group from Links object into adata.

Parameters
  • adata (anndata) – anndata

  • links (Links) – Likns object

  • method (str) – The method to summarize data.