celloracle.network package¶
The network
module implements GRN inference.
-
class
celloracle.network.
Net
(gene_expression_matrix, gem_standerdized=None, TFinfo_matrix=None, cellstate=None, TFinfo_dic=None, annotation=None, verbose=True)¶ Bases:
object
Net is a custom class for inferring sample-specific GRN from scRNA-seq data. This class is used inside the Oracle class for GRN inference. This class requires two types of information below.
Single-cell RNA-seq data: The Net class needs processed scRNA-seq data. Gene and cell filtering, quality check, normalization, log-transformation (but not scaling and centering) have to be done before starting the GRN calculation with this class. You can also use any arbitrary metadata (i.e., mRNA count, cell-cycle phase) for GRN input.
Potential regulatory connection (or base GRN): This method uses the list of potential regulatory TFs as input. This information can be calculated from ATAC-seq data using the motif-analysis module. If sample-specific ATAC-seq data is not available, you can use general TF-binding info derived from public ATAC-seq dataset of various tissue/cell type.
-
linkList
¶ The results of the GRN inference.
- Type
pandas.DataFrame
-
all_genes
¶ An array of all genes that exist in the input gene expression matrix
- Type
numpy.array
-
embedding_name
¶ The key name name in adata.obsm containing dimensional reduction coordinates
- Type
str
-
annotation
¶ Annotation. you can add custom annotation.
- Type
dictionary
-
coefs_dict
¶ Coefs of linear regression.
- Type
dictionary
-
stats_dict
¶ Statistic values about coefs.
- Type
dictionary
-
fitted_genes
¶ List of genes where the regression model was successfully calculated.
- Type
list of str
-
failed_genes
¶ List of genes that were not assigned coefs
- Type
list of str
-
cellstate
¶ A metadata for GRN input
- Type
pandas.DataFrame
-
TFinfo
¶ Information about potential regulatory TFs.
- Type
pandas.DataFrame
-
gem
¶ Merged matrix made with gene_expression_matrix and cellstate matrix.
- Type
pandas.DataFrame
-
gem_standerdized
¶ Almost the same as gem, but the gene_expression_matrix was standardized.
- Type
pandas.DataFrame
-
library_last_update_date
¶ Last update date of this code. This info is for code development. It can be deprecated in the future
- Type
str
-
object_initiation_date
¶ The date when this object was made.
- Type
str
-
addAnnotation
(annotation_dictionary)¶ Add a new annotation.
- Parameters
annotation_dictionary (dictionary) – e.g. {“sample_name”: “NIH 3T3 cell”}
-
addTFinfo_dictionary
(TFdict)¶ Add a new TF info to pre-exiting TFdict.
- Parameters
TFdict (dictionary) – python dictionary of TF info.
-
addTFinfo_matrix
(TFinfo_matrix)¶ Load TF info dataframe.
- Parameters
TFinfo (pandas.DataFrame) – information about potential regulatory TFs.
-
copy
()¶ Deepcopy itself
-
fit_All_genes
(bagging_number=200, scaling=True, model_method='bagging_ridge', command_line_mode=False, log=None, alpha=1, verbose=True, n_jobs=- 1)¶ Make ML models for all genes. The calculation will be performed in parallel using scikit-learn bagging function. You can select a modeling method (bagging_ridge or bayesian_ridge). This calculation usually takes a long time.
- Parameters
bagging_number (int) – The number of estimators for bagging.
scaling (bool) – Whether or not to scale regulatory gene expression values.
model_method (str) – ML model name. Please select either “bagging_ridge” or “bayesian_ridge”
command_line_mode (bool) – Please select False if the calculation is performed on jupyter notebook.
log (logging object) – log object to output log
alpha (int) – Strength of regularization.
verbose (bool) – Whether or not to show a progress bar.
n_jobs (int) – Number of cpu cores for parallel calculation. -1 means using all available cores.
-
fit_All_genes_parallel
(bagging_number=200, scaling=True, log=None, verbose=10)¶ IMPORTANT: this function being debugged and is currently unavailable.
Make ML models for all genes. The calculation will be performed in parallel using joblib parallel module.
- Parameters
bagging_number (int) – The number of estimators for bagging.
scaling (bool) – Whether or not to scale regulatory gene expression values.
log (logging object) – log object to output log
verbose (int) – verbose for joblib parallel
-
fit_genes
(target_genes, bagging_number=200, scaling=True, model_method='bagging_ridge', save_coefs=False, command_line_mode=False, log=None, alpha=1, verbose=True, n_jobs=- 1)¶ Make ML models for genes of interest. This calculation will be performed in parallel using scikit-learn’s bagging function. You can select a modeling method; Please chose either bagging_ridge or bayesian_ridge.
- Parameters
target_genes (list of str) – gene list
bagging_number (int) – The number of estimators for bagging.
scaling (bool) – Whether or not to scale regulatory gene expression values.
model_method (str) – ML model name. Please select either “bagging_ridge” or “bayesian_ridge”
save_coefs (bool) – Whether or not to store details of coef values in bagging model.
command_line_mode (bool) – Please select False if the calculation is performed on jupyter notebook.
log (logging object) – log object to output log
alpha (int) – Strength of regularization.
verbose (bool) – Whether or not to show a progress bar.
n_jobs (int) – Number of cpu cores for parallel calculation. -1 means using all available cores.
-
plotCoefs
(target_gene, sort=True, threshold_p=None)¶ Plot the distribution of Coef values (network edge weights).
- Parameters
target_gene (str) – gene name
sort (bool) – Whether or not to sort genes by its strength
bagging_number (int) – The number of estimators for bagging.
threshold_p (float) – the threshold for p-values. TFs will be filtered based on the p-value. if None, no filtering is applied.
-
to_hdf5
(file_path)¶ Save object as hdf5.
- Parameters
file_path (str) – file path to save file. Filename needs to end with ‘.celloracle.net’
-
updateLinkList
(verbose=True)¶ Update LinkList. LinkList is a data frame that store information about inferred GRNs.
- Parameters
verbose (bool) – Whether or not to show a progress bar
-
updateTFinfo_dictionary
(TFdict)¶ Update TF info matrix
- Parameters
TFdict (dictionary) – A python dictionary in which a key is Target gene, value are potential regulatory genes for the target gene.
-
celloracle.network.
getDF_TGxTF
(net_object, value_of_interest)¶ Extract inferred GRN information and return as a pandas.DataFrame. The results was converted as Target gene x TF.
- Parameters
net_object (Net) – Net object. GRN calculation have to be done in this object.
value_of_interest (str) – Kind of information to extract.
-
celloracle.network.
load_net_from_patquets
(folder_path)¶ Load a Net object that was saved with “save_as_compressed” function.
- Parameters
folder_path (str) – Folder path