celloracle.network package¶

The network module implements GRN inference.

class celloracle.network.Net(gene_expression_matrix, gem_standerdized=None, TFinfo_matrix=None, cellstate=None, TFinfo_dic=None, annotation=None, verbose=True)¶

Bases: object

Net is a custom class for inferring sample-specific GRN from scRNA-seq data. This class is used inside the Oracle class for GRN inference. This class requires two types of information below.

Single-cell RNA-seq data: The Net class needs processed scRNA-seq data. Gene and cell filtering, quality check, normalization, log-transformation (but not scaling and centering) have to be done before starting the GRN calculation with this class. You can also use any arbitrary metadata (i.e., mRNA count, cell-cycle phase) for GRN input.
Potential regulatory connection (or base GRN): This method uses the list of potential regulatory TFs as input. This information can be calculated from ATAC-seq data using the motif-analysis module. If sample-specific ATAC-seq data is not available, you can use general TF-binding info derived from public ATAC-seq dataset of various tissue/cell type.

linkList¶

The results of the GRN inference.

Type: pandas.DataFrame

all_genes¶

An array of all genes that exist in the input gene expression matrix

Type: numpy.array

embedding_name¶

The key name name in adata.obsm containing dimensional reduction coordinates

Type: str

annotation¶

Annotation. you can add custom annotation.

Type: dictionary

coefs_dict¶

Coefs of linear regression.

Type: dictionary

stats_dict¶

Statistic values about coefs.

Type: dictionary

fitted_genes¶

List of genes where the regression model was successfully calculated.

Type: list of str

failed_genes¶

List of genes that were not assigned coefs

Type: list of str

cellstate¶

A metadata for GRN input

Type: pandas.DataFrame

TFinfo¶

Information about potential regulatory TFs.

Type: pandas.DataFrame

gem¶

Merged matrix made with gene_expression_matrix and cellstate matrix.

Type: pandas.DataFrame

gem_standerdized¶

Almost the same as gem, but the gene_expression_matrix was standardized.

Type: pandas.DataFrame

library_last_update_date¶

Last update date of this code. This info is for code development. It can be deprecated in the future

Type: str

object_initiation_date¶

The date when this object was made.

Type: str

addAnnotation(annotation_dictionary)¶

Add a new annotation.

Parameters: annotation_dictionary (dictionary) – e.g. {“sample_name”: “NIH 3T3 cell”}

addTFinfo_dictionary(TFdict)¶

Add a new TF info to pre-exiting TFdict.

Parameters: TFdict (dictionary) – python dictionary of TF info.

addTFinfo_matrix(TFinfo_matrix)¶

Load TF info dataframe.

Parameters: TFinfo (pandas.DataFrame) – information about potential regulatory TFs.

copy()¶: Deepcopy itself

fit_All_genes(bagging_number=200, scaling=True, model_method='bagging_ridge', command_line_mode=False, log=None, alpha=1, verbose=True, n_jobs=- 1)¶

Make ML models for all genes. The calculation will be performed in parallel using scikit-learn bagging function. You can select a modeling method (bagging_ridge or bayesian_ridge). This calculation usually takes a long time.

Parameters

bagging_number (int) – The number of estimators for bagging.
scaling (bool) – Whether or not to scale regulatory gene expression values.
model_method (str) – ML model name. Please select either “bagging_ridge” or “bayesian_ridge”
command_line_mode (bool) – Please select False if the calculation is performed on jupyter notebook.
log (logging object) – log object to output log
alpha (int) – Strength of regularization.
verbose (bool) – Whether or not to show a progress bar.
n_jobs (int) – Number of cpu cores for parallel calculation. -1 means using all available cores.

fit_All_genes_parallel(bagging_number=200, scaling=True, log=None, verbose=10)¶

IMPORTANT: this function being debugged and is currently unavailable.

Make ML models for all genes. The calculation will be performed in parallel using joblib parallel module.

Parameters

bagging_number (int) – The number of estimators for bagging.
scaling (bool) – Whether or not to scale regulatory gene expression values.
log (logging object) – log object to output log
verbose (int) – verbose for joblib parallel

fit_genes(target_genes, bagging_number=200, scaling=True, model_method='bagging_ridge', save_coefs=False, command_line_mode=False, log=None, alpha=1, verbose=True, n_jobs=- 1)¶

Make ML models for genes of interest. This calculation will be performed in parallel using scikit-learn’s bagging function. You can select a modeling method; Please chose either bagging_ridge or bayesian_ridge.

Parameters

target_genes (list of str) – gene list
bagging_number (int) – The number of estimators for bagging.
scaling (bool) – Whether or not to scale regulatory gene expression values.
model_method (str) – ML model name. Please select either “bagging_ridge” or “bayesian_ridge”
save_coefs (bool) – Whether or not to store details of coef values in bagging model.
command_line_mode (bool) – Please select False if the calculation is performed on jupyter notebook.
log (logging object) – log object to output log
alpha (int) – Strength of regularization.
verbose (bool) – Whether or not to show a progress bar.
n_jobs (int) – Number of cpu cores for parallel calculation. -1 means using all available cores.

plotCoefs(target_gene, sort=True, threshold_p=None)¶

Plot the distribution of Coef values (network edge weights).

Parameters

target_gene (str) – gene name
sort (bool) – Whether or not to sort genes by its strength
bagging_number (int) – The number of estimators for bagging.
threshold_p (float) – the threshold for p-values. TFs will be filtered based on the p-value. if None, no filtering is applied.

to_hdf5(file_path)¶

Save object as hdf5.

Parameters: file_path (str) – file path to save file. Filename needs to end with ‘.celloracle.net’

updateLinkList(verbose=True)¶

Update LinkList. LinkList is a data frame that store information about inferred GRNs.

Parameters: verbose (bool) – Whether or not to show a progress bar

updateTFinfo_dictionary(TFdict)¶

Update TF info matrix

Parameters: TFdict (dictionary) – A python dictionary in which a key is Target gene, value are potential regulatory genes for the target gene.

celloracle.network.getDF_TGxTF(net_object, value_of_interest)¶

Extract inferred GRN information and return as a pandas.DataFrame. The results was converted as Target gene x TF.

Parameters

net_object (Net) – Net object. GRN calculation have to be done in this object.
value_of_interest (str) – Kind of information to extract.

celloracle.network.load_net_from_patquets(folder_path)¶

Load a Net object that was saved with “save_as_compressed” function.

Parameters: folder_path (str) – Folder path