celloracle.network package¶
The network module implements GRN inference.
- 
class celloracle.network.Net(gene_expression_matrix, gem_standerdized=None, TFinfo_matrix=None, cellstate=None, TFinfo_dic=None, annotation=None, verbose=True)¶
- Bases: - object- Net is a custom class for inferring sample-specific GRN from scRNA-seq data. This class is used inside the Oracle class for GRN inference. This class requires two types of information below. - Single-cell RNA-seq data: The Net class needs processed scRNA-seq data. Gene and cell filtering, quality check, normalization, log-transformation (but not scaling and centering) have to be done before starting the GRN calculation with this class. You can also use any arbitrary metadata (i.e., mRNA count, cell-cycle phase) for GRN input. 
- Potential regulatory connection (or base GRN): This method uses the list of potential regulatory TFs as input. This information can be calculated from ATAC-seq data using the motif-analysis module. If sample-specific ATAC-seq data is not available, you can use general TF-binding info derived from public ATAC-seq dataset of various tissue/cell type. 
 - 
linkList¶
- The results of the GRN inference. - Type
- pandas.DataFrame 
 
 - 
all_genes¶
- An array of all genes that exist in the input gene expression matrix - Type
- numpy.array 
 
 - 
embedding_name¶
- The key name name in adata.obsm containing dimensional reduction coordinates - Type
- str 
 
 - 
annotation¶
- Annotation. you can add custom annotation. - Type
- dictionary 
 
 - 
coefs_dict¶
- Coefs of linear regression. - Type
- dictionary 
 
 - 
stats_dict¶
- Statistic values about coefs. - Type
- dictionary 
 
 - 
fitted_genes¶
- List of genes where the regression model was successfully calculated. - Type
- list of str 
 
 - 
failed_genes¶
- List of genes that were not assigned coefs - Type
- list of str 
 
 - 
cellstate¶
- A metadata for GRN input - Type
- pandas.DataFrame 
 
 - 
TFinfo¶
- Information about potential regulatory TFs. - Type
- pandas.DataFrame 
 
 - 
gem¶
- Merged matrix made with gene_expression_matrix and cellstate matrix. - Type
- pandas.DataFrame 
 
 - 
gem_standerdized¶
- Almost the same as gem, but the gene_expression_matrix was standardized. - Type
- pandas.DataFrame 
 
 - 
library_last_update_date¶
- Last update date of this code. This info is for code development. It can be deprecated in the future - Type
- str 
 
 - 
object_initiation_date¶
- The date when this object was made. - Type
- str 
 
 - 
addAnnotation(annotation_dictionary)¶
- Add a new annotation. - Parameters
- annotation_dictionary (dictionary) – e.g. {“sample_name”: “NIH 3T3 cell”} 
 
 - 
addTFinfo_dictionary(TFdict)¶
- Add a new TF info to pre-exiting TFdict. - Parameters
- TFdict (dictionary) – python dictionary of TF info. 
 
 - 
addTFinfo_matrix(TFinfo_matrix)¶
- Load TF info dataframe. - Parameters
- TFinfo (pandas.DataFrame) – information about potential regulatory TFs. 
 
 - 
copy()¶
- Deepcopy itself 
 - 
fit_All_genes(bagging_number=200, scaling=True, model_method='bagging_ridge', command_line_mode=False, log=None, alpha=1, verbose=True, n_jobs=- 1)¶
- Make ML models for all genes. The calculation will be performed in parallel using scikit-learn bagging function. You can select a modeling method (bagging_ridge or bayesian_ridge). This calculation usually takes a long time. - Parameters
- bagging_number (int) – The number of estimators for bagging. 
- scaling (bool) – Whether or not to scale regulatory gene expression values. 
- model_method (str) – ML model name. Please select either “bagging_ridge” or “bayesian_ridge” 
- command_line_mode (bool) – Please select False if the calculation is performed on jupyter notebook. 
- log (logging object) – log object to output log 
- alpha (int) – Strength of regularization. 
- verbose (bool) – Whether or not to show a progress bar. 
- n_jobs (int) – Number of cpu cores for parallel calculation. -1 means using all available cores. 
 
 
 - 
fit_All_genes_parallel(bagging_number=200, scaling=True, log=None, verbose=10)¶
- IMPORTANT: this function being debugged and is currently unavailable. - Make ML models for all genes. The calculation will be performed in parallel using joblib parallel module. - Parameters
- bagging_number (int) – The number of estimators for bagging. 
- scaling (bool) – Whether or not to scale regulatory gene expression values. 
- log (logging object) – log object to output log 
- verbose (int) – verbose for joblib parallel 
 
 
 - 
fit_genes(target_genes, bagging_number=200, scaling=True, model_method='bagging_ridge', save_coefs=False, command_line_mode=False, log=None, alpha=1, verbose=True, n_jobs=- 1)¶
- Make ML models for genes of interest. This calculation will be performed in parallel using scikit-learn’s bagging function. You can select a modeling method; Please chose either bagging_ridge or bayesian_ridge. - Parameters
- target_genes (list of str) – gene list 
- bagging_number (int) – The number of estimators for bagging. 
- scaling (bool) – Whether or not to scale regulatory gene expression values. 
- model_method (str) – ML model name. Please select either “bagging_ridge” or “bayesian_ridge” 
- save_coefs (bool) – Whether or not to store details of coef values in bagging model. 
- command_line_mode (bool) – Please select False if the calculation is performed on jupyter notebook. 
- log (logging object) – log object to output log 
- alpha (int) – Strength of regularization. 
- verbose (bool) – Whether or not to show a progress bar. 
- n_jobs (int) – Number of cpu cores for parallel calculation. -1 means using all available cores. 
 
 
 - 
plotCoefs(target_gene, sort=True, threshold_p=None)¶
- Plot the distribution of Coef values (network edge weights). - Parameters
- target_gene (str) – gene name 
- sort (bool) – Whether or not to sort genes by its strength 
- bagging_number (int) – The number of estimators for bagging. 
- threshold_p (float) – the threshold for p-values. TFs will be filtered based on the p-value. if None, no filtering is applied. 
 
 
 - 
to_hdf5(file_path)¶
- Save object as hdf5. - Parameters
- file_path (str) – file path to save file. Filename needs to end with ‘.celloracle.net’ 
 
 - 
updateLinkList(verbose=True)¶
- Update LinkList. LinkList is a data frame that store information about inferred GRNs. - Parameters
- verbose (bool) – Whether or not to show a progress bar 
 
 - 
updateTFinfo_dictionary(TFdict)¶
- Update TF info matrix - Parameters
- TFdict (dictionary) – A python dictionary in which a key is Target gene, value are potential regulatory genes for the target gene. 
 
 
- 
celloracle.network.getDF_TGxTF(net_object, value_of_interest)¶
- Extract inferred GRN information and return as a pandas.DataFrame. The results was converted as Target gene x TF. - Parameters
- net_object (Net) – Net object. GRN calculation have to be done in this object. 
- value_of_interest (str) – Kind of information to extract. 
 
 
- 
celloracle.network.load_net_from_patquets(folder_path)¶
- Load a Net object that was saved with “save_as_compressed” function. - Parameters
- folder_path (str) – Folder path