celloracle.network package

The network module implements GRN inference.

class celloracle.network.Net(gene_expression_matrix, gem_standerdized=None, TFinfo_matrix=None, cellstate=None, TFinfo_dic=None, annotation=None, verbose=True)

Bases: object

Net is a custom class for inferring sample-specific GRN from scRNA-seq data. This class is used inside the Oracle class for GRN inference. This class requires two types of information below.

  1. Single-cell RNA-seq data: The Net class needs processed scRNA-seq data. Gene and cell filtering, quality check, normalization, log-transformation (but not scaling and centering) have to be done before starting the GRN calculation with this class. You can also use any arbitrary metadata (i.e., mRNA count, cell-cycle phase) for GRN input.

  2. Potential regulatory connection (or base GRN): This method uses the list of potential regulatory TFs as input. This information can be calculated from ATAC-seq data using the motif-analysis module. If sample-specific ATAC-seq data is not available, you can use general TF-binding info derived from public ATAC-seq dataset of various tissue/cell type.

The results of the GRN inference.

Type

pandas.DataFrame

all_genes

An array of all genes that exist in the input gene expression matrix

Type

numpy.array

embedding_name

The key name name in adata.obsm containing dimensional reduction coordinates

Type

str

annotation

Annotation. you can add custom annotation.

Type

dictionary

coefs_dict

Coefs of linear regression.

Type

dictionary

stats_dict

Statistic values about coefs.

Type

dictionary

fitted_genes

List of genes where the regression model was successfully calculated.

Type

list of str

failed_genes

List of genes that were not assigned coefs

Type

list of str

cellstate

A metadata for GRN input

Type

pandas.DataFrame

TFinfo

Information about potential regulatory TFs.

Type

pandas.DataFrame

gem

Merged matrix made with gene_expression_matrix and cellstate matrix.

Type

pandas.DataFrame

gem_standerdized

Almost the same as gem, but the gene_expression_matrix was standardized.

Type

pandas.DataFrame

library_last_update_date

Last update date of this code. This info is for code development. It can be deprecated in the future

Type

str

object_initiation_date

The date when this object was made.

Type

str

addAnnotation(annotation_dictionary)

Add a new annotation.

Parameters

annotation_dictionary (dictionary) – e.g. {“sample_name”: “NIH 3T3 cell”}

addTFinfo_dictionary(TFdict)

Add a new TF info to pre-exiting TFdict.

Parameters

TFdict (dictionary) – python dictionary of TF info.

addTFinfo_matrix(TFinfo_matrix)

Load TF info dataframe.

Parameters

TFinfo (pandas.DataFrame) – information about potential regulatory TFs.

copy()

Deepcopy itself

fit_All_genes(bagging_number=200, scaling=True, model_method='bagging_ridge', command_line_mode=False, log=None, alpha=1, verbose=True, n_jobs=- 1)

Make ML models for all genes. The calculation will be performed in parallel using scikit-learn bagging function. You can select a modeling method (bagging_ridge or bayesian_ridge). This calculation usually takes a long time.

Parameters
  • bagging_number (int) – The number of estimators for bagging.

  • scaling (bool) – Whether or not to scale regulatory gene expression values.

  • model_method (str) – ML model name. Please select either “bagging_ridge” or “bayesian_ridge”

  • command_line_mode (bool) – Please select False if the calculation is performed on jupyter notebook.

  • log (logging object) – log object to output log

  • alpha (int) – Strength of regularization.

  • verbose (bool) – Whether or not to show a progress bar.

  • n_jobs (int) – Number of cpu cores for parallel calculation. -1 means using all available cores.

fit_All_genes_parallel(bagging_number=200, scaling=True, log=None, verbose=10)

IMPORTANT: this function being debugged and is currently unavailable.

Make ML models for all genes. The calculation will be performed in parallel using joblib parallel module.

Parameters
  • bagging_number (int) – The number of estimators for bagging.

  • scaling (bool) – Whether or not to scale regulatory gene expression values.

  • log (logging object) – log object to output log

  • verbose (int) – verbose for joblib parallel

fit_genes(target_genes, bagging_number=200, scaling=True, model_method='bagging_ridge', save_coefs=False, command_line_mode=False, log=None, alpha=1, verbose=True, n_jobs=- 1)

Make ML models for genes of interest. This calculation will be performed in parallel using scikit-learn’s bagging function. You can select a modeling method; Please chose either bagging_ridge or bayesian_ridge.

Parameters
  • target_genes (list of str) – gene list

  • bagging_number (int) – The number of estimators for bagging.

  • scaling (bool) – Whether or not to scale regulatory gene expression values.

  • model_method (str) – ML model name. Please select either “bagging_ridge” or “bayesian_ridge”

  • save_coefs (bool) – Whether or not to store details of coef values in bagging model.

  • command_line_mode (bool) – Please select False if the calculation is performed on jupyter notebook.

  • log (logging object) – log object to output log

  • alpha (int) – Strength of regularization.

  • verbose (bool) – Whether or not to show a progress bar.

  • n_jobs (int) – Number of cpu cores for parallel calculation. -1 means using all available cores.

plotCoefs(target_gene, sort=True, threshold_p=None)

Plot the distribution of Coef values (network edge weights).

Parameters
  • target_gene (str) – gene name

  • sort (bool) – Whether or not to sort genes by its strength

  • bagging_number (int) – The number of estimators for bagging.

  • threshold_p (float) – the threshold for p-values. TFs will be filtered based on the p-value. if None, no filtering is applied.

to_hdf5(file_path)

Save object as hdf5.

Parameters

file_path (str) – file path to save file. Filename needs to end with ‘.celloracle.net’

Update LinkList. LinkList is a data frame that store information about inferred GRNs.

Parameters

verbose (bool) – Whether or not to show a progress bar

updateTFinfo_dictionary(TFdict)

Update TF info matrix

Parameters

TFdict (dictionary) – A python dictionary in which a key is Target gene, value are potential regulatory genes for the target gene.

celloracle.network.getDF_TGxTF(net_object, value_of_interest)

Extract inferred GRN information and return as a pandas.DataFrame. The results was converted as Target gene x TF.

Parameters
  • net_object (Net) – Net object. GRN calculation have to be done in this object.

  • value_of_interest (str) – Kind of information to extract.

celloracle.network.load_net_from_patquets(folder_path)

Load a Net object that was saved with “save_as_compressed” function.

Parameters

folder_path (str) – Folder path