Custom class in celloracle¶
We define some custom classes in celloracle.
-
class
celloracle.
Links
(name, links_dict={})¶ Bases:
object
This is a class for the processing and visualization of GRNs. Links object stores cluster-specific GRNs and metadata. Please use “get_links” function in Oracle object to generate Links object.
-
links_dict
¶ Dictionary that store unprocessed network data.
- Type
dictionary
-
filtered_links
¶ Dictionary that store filtered network data.
- Type
dictionary
-
merged_score
¶ Network scores.
- Type
pandas.dataframe
-
cluster
¶ List of cluster name.
- Type
list of str
-
name
¶ Name of clustering unit.
- Type
str
-
palette
¶ DataFrame that store color information.
- Type
pandas.dataframe
-
filter_links
(p=0.001, weight='coef_abs', threshold_number=10000, genelist_source=None, genelist_target=None, thread_number=None)¶ Filter network edges. In most cases, inferred GRN has non-significant random edges. We have to remove these edges before analyzing the network structure. You can do the filtering in any of the following ways.
Filter based on the p-value of the network edge. Please enter p-value for thresholding.
Filter based on network edge number. If you set the number, network edges will be filtered based on the order of a network score. The top n-th network edges with network weight will remain, and the other edges will be removed. The network data has several types of network weight, so you have to select which network weight do you want to use.
Filter based on an arbitrary gene list. You can set a gene list for source nodes or target nodes.
- Parameters
p (float) – threshold for p-value of the network edge.
weight (str) – Please select network weight name for the filtering
genelist_source (list of str) – gene list to remain in regulatory gene nodes. Default is None.
genelist_target (list of str) – gene list to remain in target gene nodes. Default is None.
-
get_network_entropy
(value='coef_abs')¶ Calculate network entropy scores.
- Parameters
value (str) – Default is “coef_abs”.
-
get_network_score
()¶ Get several network sores using igraph library. The following scores are calculated: [‘degree_all’, ‘degree_centrality_all’, ‘degree_in’,
‘degree_centrality_in’, ‘degree_out’, ‘degree_centrality_out’, ‘betweenness_centrality’, ‘eigenvector_centrality’]
-
get_score
(test_mode=False, n_jobs=- 1)¶ Get several network sores using R-igraph, linkcomm, and rnetcarto. This require R packages.
-
plot_cartography_scatter_per_cluster
(gois=None, clusters=None, scatter=True, kde=False, auto_gene_annot=False, percentile=98, args_dot={'n_levels': 105}, args_line={'c': 'gray'}, args_annot={}, save=None)¶ Make a gene network cartography plot. Please read the original paper describing gene network cartography for more information. https://www.nature.com/articles/nature03288
- Parameters
links (Links) – See network_analysis.Links class for detail.
gois (list of srt) – List of gene name to highlight.
clusters (list of str) – List of cluster name to analyze. If None, all clusters in Links object will be analyzed.
scatter (bool) – Whether to make a scatter plot.
auto_gene_annot (bool) – Whether to pick up genes to make an annotation.
percentile (float) – Genes with a network score above the percentile will be shown with annotation. Default is 98.
args_dot (dictionary) – Arguments for scatter plot.
args_line (dictionary) – Arguments for lines in cartography plot.
args_annot (dictionary) – Arguments for annotation in plots.
save (str) – Folder path to save plots. If the folder does not exist in the path, the function creates the folder. Plots will not be saved if [save=None]. Default is None.
-
plot_cartography_term
(goi, save=None, plt_show=True)¶ Plot the gene network cartography term like a heatmap. Please read the original paper of gene network cartography for the principle of gene network cartography. https://www.nature.com/articles/nature03288
- Parameters
links (Links) – See network_analysis.Links class for detail.
gois (list of srt) – List of gene name to highlight.
save (str) – Folder path to save plots. If the folder does not exist in the path, the function creates the folder. Plots will not be saved if [save=None]. Default is None.
-
plot_degree_distributions
(plot_model=False, save=None)¶ Plot the network degree distributions (the number of edge per gene). The network degree will be visualized in both linear scale and log scale.
- Parameters
links (Links) – See network_analysis.Links class for detail.
plot_model (bool) – Whether to plot linear approximation line.
save (str) – Folder path to save plots. If the folder does not exist in the path, the function creates the folder. Plots will not be saved if [save=None]. Default is None.
-
plot_network_entropy_distributions
(update_network_entropy=False, save=None)¶ Plot the distribution for network entropy. See the CellOracle paper for more detail.
- Parameters
links (Links object) – See network_analysis.Links class for detail.
values (list of str) – The list of score to visualize. If it is None, all network score (listed above) will be used.
update_network_entropy (bool) – Whether to recalculate network entropy.
save (str) – Folder path to save plots. If the folder does not exist in the path, the function creates the folder. Plots will not be saved if [save=None]. Default is None.
-
plot_score_comparison_2D
(value, cluster1, cluster2, percentile=99, annot_shifts=None, save=None, plt_show=True, interactive=False)¶ Make a scatter plot that compares specific network scores in two groups.
- Parameters
links (Links) – See network_analysis.Links class for detail.
value (srt) – The network score type.
cluster1 (str) – Cluster name. Network scores in cluster1 will be visualized in the x-axis.
cluster2 (str) – Cluster name. Network scores in cluster2 will be visualized in the y-axis.
percentile (float) – Genes with a network score above the percentile will be shown with annotation. Default is 99.
annot_shifts ((float, float)) – Annotation visualization setting.
save (str) – Folder path to save plots. If the folder does not exist in the path, the function creates the folder. Plots will not be saved if [save=None]. Default is None.
-
plot_score_discributions
(values=None, method='boxplot', save=None)¶ Plot the distribution of network scores. An individual data point is a network edge (gene).
- Parameters
links (Links) – See Links class for details.
values (list of str) – The list of score to visualize. If it is None, all of the network score will be used.
method (str) – Plotting method. Select either “boxplot” or “barplot”.
save (str) – Folder path to save plots. If the folder does not exist in the path, the function creates the folder. Plots will not be saved if [save=None]. Default is None.
-
plot_score_per_cluster
(goi, save=None, plt_show=True)¶ Plot network score for a gene. This function visualizes the network score for a specific gene between clusters to get an insight into the dynamics of the gene.
- Parameters
links (Links) – See network_analysis.Links class for detail.
goi (srt) – Gene name.
save (str) – Folder path to save plots. If the folder does not exist in the path, the function creates the folder. Plots will not be saved if [save=None]. Default is None.
-
plot_scores_as_rank
(cluster, n_gene=50, save=None)¶ Pick up top n-th genes wich high-network scores and make plots.
- Parameters
links (Links) – See network_analysis.Links class for detail.
cluster (str) – Cluster name to analyze.
n_gene (int) – Number of genes to plot. Default is 50.
save (str) – Folder path to save plots. If the folder does not exist in the path, the function creates the folder. Plots will not be saved if [save=None]. Default is None.
-
to_hdf5
(file_path)¶ Save object as hdf5.
- Parameters
file_path (str) – file path to save file. Filename needs to end with ‘.celloracle.links’
-
-
class
celloracle.
Net
(gene_expression_matrix, gem_standerdized=None, TFinfo_matrix=None, cellstate=None, TFinfo_dic=None, annotation=None, verbose=True)¶ Bases:
object
Net is a custom class for inferring sample-specific GRN from scRNA-seq data. This class is used inside the Oracle class for GRN inference. This class requires two types of information below.
Single-cell RNA-seq data: The Net class needs processed scRNA-seq data. Gene and cell filtering, quality check, normalization, log-transformation (but not scaling and centering) have to be done before starting the GRN calculation with this class. You can also use any arbitrary metadata (i.e., mRNA count, cell-cycle phase) for GRN input.
Potential regulatory connection (or base GRN): This method uses the list of potential regulatory TFs as input. This information can be calculated from ATAC-seq data using the motif-analysis module. If sample-specific ATAC-seq data is not available, you can use general TF-binding info derived from public ATAC-seq dataset of various tissue/cell type.
-
linkList
¶ The results of the GRN inference.
- Type
pandas.DataFrame
-
all_genes
¶ An array of all genes that exist in the input gene expression matrix
- Type
numpy.array
-
embedding_name
¶ The key name name in adata.obsm containing dimensional reduction coordinates
- Type
str
-
annotation
¶ Annotation. you can add custom annotation.
- Type
dictionary
-
coefs_dict
¶ Coefs of linear regression.
- Type
dictionary
-
stats_dict
¶ Statistic values about coefs.
- Type
dictionary
-
fitted_genes
¶ List of genes where the regression model was successfully calculated.
- Type
list of str
-
failed_genes
¶ List of genes that were not assigned coefs
- Type
list of str
-
cellstate
¶ A metadata for GRN input
- Type
pandas.DataFrame
-
TFinfo
¶ Information about potential regulatory TFs.
- Type
pandas.DataFrame
-
gem
¶ Merged matrix made with gene_expression_matrix and cellstate matrix.
- Type
pandas.DataFrame
-
gem_standerdized
¶ Almost the same as gem, but the gene_expression_matrix was standardized.
- Type
pandas.DataFrame
-
library_last_update_date
¶ Last update date of this code. This info is for code development. It can be deprecated in the future
- Type
str
-
object_initiation_date
¶ The date when this object was made.
- Type
str
-
addAnnotation
(annotation_dictionary)¶ Add a new annotation.
- Parameters
annotation_dictionary (dictionary) – e.g. {“sample_name”: “NIH 3T3 cell”}
-
addTFinfo_dictionary
(TFdict)¶ Add a new TF info to pre-exiting TFdict.
- Parameters
TFdict (dictionary) – python dictionary of TF info.
-
addTFinfo_matrix
(TFinfo_matrix)¶ Load TF info dataframe.
- Parameters
TFinfo (pandas.DataFrame) – information about potential regulatory TFs.
-
copy
()¶ Deepcopy itself
-
fit_All_genes
(bagging_number=200, scaling=True, model_method='bagging_ridge', command_line_mode=False, log=None, alpha=1, verbose=True, n_jobs=- 1)¶ Make ML models for all genes. The calculation will be performed in parallel using scikit-learn bagging function. You can select a modeling method (bagging_ridge or bayesian_ridge). This calculation usually takes a long time.
- Parameters
bagging_number (int) – The number of estimators for bagging.
scaling (bool) – Whether or not to scale regulatory gene expression values.
model_method (str) – ML model name. Please select either “bagging_ridge” or “bayesian_ridge”
command_line_mode (bool) – Please select False if the calculation is performed on jupyter notebook.
log (logging object) – log object to output log
alpha (int) – Strength of regularization.
verbose (bool) – Whether or not to show a progress bar.
n_jobs (int) – Number of cpu cores for parallel calculation. -1 means using all available cores.
-
fit_All_genes_parallel
(bagging_number=200, scaling=True, log=None, verbose=10)¶ IMPORTANT: this function being debugged and is currently unavailable.
Make ML models for all genes. The calculation will be performed in parallel using joblib parallel module.
- Parameters
bagging_number (int) – The number of estimators for bagging.
scaling (bool) – Whether or not to scale regulatory gene expression values.
log (logging object) – log object to output log
verbose (int) – verbose for joblib parallel
-
fit_genes
(target_genes, bagging_number=200, scaling=True, model_method='bagging_ridge', save_coefs=False, command_line_mode=False, log=None, alpha=1, verbose=True, n_jobs=- 1)¶ Make ML models for genes of interest. This calculation will be performed in parallel using scikit-learn’s bagging function. You can select a modeling method; Please chose either bagging_ridge or bayesian_ridge.
- Parameters
target_genes (list of str) – gene list
bagging_number (int) – The number of estimators for bagging.
scaling (bool) – Whether or not to scale regulatory gene expression values.
model_method (str) – ML model name. Please select either “bagging_ridge” or “bayesian_ridge”
save_coefs (bool) – Whether or not to store details of coef values in bagging model.
command_line_mode (bool) – Please select False if the calculation is performed on jupyter notebook.
log (logging object) – log object to output log
alpha (int) – Strength of regularization.
verbose (bool) – Whether or not to show a progress bar.
n_jobs (int) – Number of cpu cores for parallel calculation. -1 means using all available cores.
-
plotCoefs
(target_gene, sort=True, threshold_p=None)¶ Plot the distribution of Coef values (network edge weights).
- Parameters
target_gene (str) – gene name
sort (bool) – Whether or not to sort genes by its strength
bagging_number (int) – The number of estimators for bagging.
threshold_p (float) – the threshold for p-values. TFs will be filtered based on the p-value. if None, no filtering is applied.
-
to_hdf5
(file_path)¶ Save object as hdf5.
- Parameters
file_path (str) – file path to save file. Filename needs to end with ‘.celloracle.net’
-
updateLinkList
(verbose=True)¶ Update LinkList. LinkList is a data frame that store information about inferred GRNs.
- Parameters
verbose (bool) – Whether or not to show a progress bar
-
updateTFinfo_dictionary
(TFdict)¶ Update TF info matrix
- Parameters
TFdict (dictionary) – A python dictionary in which a key is Target gene, value are potential regulatory genes for the target gene.
-
class
celloracle.
Oracle
¶ Bases:
celloracle.trajectory.modified_VelocytoLoom_class.modified_VelocytoLoom
,celloracle.visualizations.oracle_object_visualization.Oracle_visualization
Oracle is the main class in CellOracle. Oracle object imports scRNA-seq data (anndata) and TF information to infer cluster-specific GRNs. It can predict the future gene expression patterns and cell state transitions in response to the perturbation of TFs. Please see the CellOracle paper for details. The code of the Oracle class was made of the three components below.
Anndata: Gene expression matrix and metadata from single-cell RNA-seq are stored in the anndata object. Processed values, such as normalized counts and simulated values, are stored as layers of anndata. Metadata (i.e., Cluster info) are saved in anndata.obs. Refer to scanpy/anndata documentation for detail.
Net: Net is a custom class in celloracle. Net object processes several data to infer GRN. See the Net class documentation for details.
VelycytoLoom: Calculation of transition probability and visualization of directed trajectory graph will be performed in the same way as velocytoloom. VelocytoLoom is class from Velocyto, a python library for RNA-velocity analysis. In celloracle, we use some functions in velocytoloom for the visualization.
-
adata
¶ Imported anndata object
- Type
anndata
-
cluster_column_name
¶ The column name in adata.obs containing cluster info
- Type
str
-
embedding_name
¶ The key name in adata.obsm containing dimensional reduction cordinates
- Type
str
-
addTFinfo_dictionary
(TFdict)¶ Add new TF info to pre-existing TFdict. Values in the old TF dictionary will remain.
- Parameters
TFdict (dictionary) – Python dictionary of TF info.
-
calculate_mass_filter
(min_mass=0.01, plot=False)¶
-
calculate_p_mass
(smooth=0.8, n_grid=40, n_neighbors=200, n_jobs=- 1)¶
-
calculate_randomized_coef_table
(random_seed=123)¶ Calculate randomized GRN coef table.
-
change_cluster_unit
(new_cluster_column_name)¶ Change clustering unit. If you change cluster, previous GRN data and simulation data will be delated. Please re-calculate GRNs.
-
clip_delta_X
()¶ To avoid issue caused by out-of-distribution prediction, this function clip simulated gene expression value to the unperturbed gene expression range.
-
copy
()¶ Deepcopy itself.
-
count_cells_in_mc_resutls
(cluster_use, end=- 1, order=None)¶ Count the simulated cell by the cluster.
- Parameters
cluster_use (str) – cluster information name in anndata.obs. You can use any cluster information in anndata.obs.
end (int) – The end point of Sankey-diagram. Please select a step in the Markov simulation. if you set [end=-1], the final step of Markov simulation will be used.
- Returns
Number of cells before / after simulation
- Return type
pandas.DataFrame
-
estimate_impact_of_perturbations_under_various_ns
(perturb_condition, order=1, n_prop_max=5, GRN_unit=None, figsize=[7, 3])¶ This function is designed to help user to estimate appropriate n for signal propagation. The function will do the following calculation for each n and plot results as line plot. 1. Calculate signal propagation. 2. Calculate the vector length of delta_X, which represents the simulated shift vector for each cell in the multi dimensional gene expression space. 3. Calculate mean of delta_X for each cluster. Repeat step 1~3 for each n and plot results as a line plot.
- Parameters
perturb_condition (dictionary) – Please refer to the function ‘simulate_shift’ for detail of this.
order (int) – If order=1, this function calculate l1 norm. If order=2, it calculate l2 norm.
n_prop_max (int) – Max of n to try.
- Return
matplotlib figure
-
evaluate_and_plot_simulation_value_distribution
(n_genes=4, n_bins=10, alpha=0.5, figsize=[5, 3], save=None)¶ This function will visualize distribution of original gene expression value and simulated values. This cunction is built to confirm there is no significant out-of-distribution in the simulation results.
- Parameters
n_genes (int) – Number of genes to show. This functin will show the results of top n_genes with large difference between original and simulation values.
n_bins (int) – Number of bins.
alpha (float) – Transparency.
figsize ([float, float]) – Figure size.
save (str) – Folder path to save your plots. If it is not specified, no figure is saved.
- Returns
None
-
evaluate_simulated_gene_distribution_range
()¶ CellOracle does not intend to simulate out-of-distribution simulation. This function evaluates how the simulated gene expression values differ from the undisturbed gene expression distribution range.
-
extract_active_gene_lists
(return_as=None, verbose=False)¶ - Parameters
return_as (str) – If not None, it returns dictionary or list. Chose either “indivisual_dict” or “unified_list”.
verbose (bool) – Whether to show progress bar.
- Returns
The format depends on the argument, “return_as”.
- Return type
dictionary or list
-
fit_GRN_for_simulation
(GRN_unit='cluster', alpha=1, use_cluster_specific_TFdict=False, verbose_level=1)¶ Do GRN inference. Please see the paper of CellOracle paper for details.
GRN can be constructed for the entire population or each clusters. If you want to infer cluster-specific GRN, please set [GRN_unit=”cluster”]. You can select cluster information when you import data.
If you set [GRN_unit=”whole”], GRN will be made using all cells.
- Parameters
GRN_unit (str) – Select “cluster” or “whole”
alpha (float or int) – The strength of regularization. If you set a lower value, the sensitivity increases, and you can detect weaker network connections. However, there may be more noise. If you select a higher value, it will reduce the chance of overfitting.
verbose_level (int) – if [verbose_level>1], most detailed progress information will be shown. if [1 >= verbose_level > 0], one progress bar will be shown. if [verbose_level == 0], no progress bar will be shown.
-
get_cluster_specific_TFdict_from_Links
(links_object, ignore_warning=False)¶ Extract TF and its target gene information from Links object. This function can be used to reconstruct GRNs based on pre-existing GRNs saved in Links object.
- Parameters
links_object (Links) – Please see the explanation of Links class.
-
get_links
(cluster_name_for_GRN_unit=None, alpha=10, bagging_number=20, verbose_level=1, test_mode=False, model_method='bagging_ridge', ignore_warning=False, n_jobs=- 1)¶ Makes GRN for each cluster and returns results as a Links object. Several preprocessing should be done before using this function.
- Parameters
cluster_name_for_GRN_unit (str) – Cluster name for GRN calculation. The cluster information should be stored in Oracle.adata.obs.
alpha (float or int) – The strength of regularization. If you set a lower value, the sensitivity increases, and you can detect weaker network connections. However, there may be more noise. If you select a higher value, it will reduce the chance of overfitting.
bagging_number (int) – The number used in bagging calculation.
verbose_level (int) – if [verbose_level>1], most detailed progress information will be shown. if [verbose_level > 0], one progress bar will be shown. if [verbose_level == 0], no progress bar will be shown.
test_mode (bool) – If test_mode is True, GRN calculation will be done for only one cluster rather than all clusters.
model_method (str) – Chose modeling algorithm. “bagging_ridge” or “bayesian_ridge”
n_jobs (int) – Number of cpu cores for parallel calculation. -1 means using all available cores. Default is -1.
-
get_markov_simulation_cell_transition_table
(cluster_column_name=None, end=- 1, return_df=True)¶ Calculate cell count in the initial state and final state after Markov simulation. Cell counts are grouped by the cluster of interest. Result will be stored as 2D matrix.
-
get_markvov_simulation_cell_transition_table
(cluster_column_name=None, end=- 1, return_df=True)¶
-
import_TF_data
(TF_info_matrix=None, TF_info_matrix_path=None, TFdict=None)¶ Load data about potential-regulatory TFs. You can import either TF_info_matrix or TFdict. For more information on how to make these files, please see the motif analysis module within the celloracle tutorial.
- Parameters
TF_info_matrix (pandas.DataFrame) – TF_info_matrix.
TF_info_matrix_path (str) – File path for TF_info_matrix (pandas.DataFrame).
TFdict (dictionary) – Python dictionary of TF info.
-
import_anndata_as_normalized_count
(adata, cluster_column_name=None, embedding_name=None, test_mode=False)¶ Load scRNA-seq data. scRNA-seq data should be prepared as an anndata object. Preprocessing (cell and gene filtering, dimensional reduction, clustering, etc.) should be done before loading data. The method will import NORMALIZED and LOG TRANSFORMED data but NOT SCALED and NOT CENTERED data. See the tutorial for more details on how to process scRNA-seq data.
- Parameters
adata (anndata) – anndata object containing scRNA-seq data.
cluster_column_name (str) – the name of column containing cluster information in anndata.obs. Clustering data should be in anndata.obs.
embedding_name (str) – the key name for dimensional reduction information in anndata.obsm. Dimensional reduction (or 2D trajectory graph) should be in anndata.obsm.
transform (str) – The method for log-transformation. Chose one from “natural_log” or “log2”.
-
import_anndata_as_raw_count
(adata, cluster_column_name=None, embedding_name=None, transform='natural_log')¶ Load scRNA-seq data. scRNA-seq data should be prepared as an anndata object. Preprocessing (cell and gene filtering, dimensional reduction, clustering, etc.) should be done before loading data. The method imports RAW GENE COUNTS because unscaled and uncentered gene expression data are required for the GRN inference and simulation. See tutorial notebook for the details about how to process scRNA-seq data.
- Parameters
adata (anndata) – anndata object that stores scRNA-seq data.
cluster_column_name (str) – the name of column containing cluster information in anndata.obs. Clustering data should be in anndata.obs.
embedding_name (str) – the key name for dimensional reduction information in anndata.obsm. Dimensional reduction (or 2D trajectory graph) should be in anndata.obsm.
transform (str) – The method for log-transformation. Chose one from “natural_log” or “log2”.
-
plot_mc_results_as_kde
(n_time, args={})¶ Pick up one timepoint in the cell state-transition simulation and plot as a kde plot.
- Parameters
n_time (int) – the number in Markov simulation
args (dictionary) – An argument for seaborn.kdeplot. See seaborn documentation for details (https://seaborn.pydata.org/generated/seaborn.kdeplot.html#seaborn.kdeplot).
-
plot_mc_results_as_sankey
(cluster_use, start=0, end=- 1, order=None, font_size=10)¶ Plot the simulated cell state-transition as a Sankey-diagram after groping by the cluster.
- Parameters
cluster_use (str) – cluster information name in anndata.obs. You can use any cluster information in anndata.obs.
start (int) – The starting point of Sankey-diagram. Please select a step in the Markov simulation.
end (int) – The end point of Sankey-diagram. Please select a step in the Markov simulation. if you set [end=-1], the final step of Markov simulation will be used.
order (list of str) – The order of cluster name in the Sankey-diagram.
font_size (int) – Font size for cluster name label in the Sankey diagram.
-
plot_mc_results_as_trajectory
(cell_name, time_range, args={})¶ Pick up several timepoints in the cell state-transition simulation and plot as a line plot. This function can be used to visualize how cell-state changes after perturbation focusing on a specific cell.
- Parameters
cell_name (str) – cell name. chose from adata.obs.index
time_range (list of int) – the list of index in Markov simulation
args (dictionary) – dictionary for the arguments for matplotlib.pyplit.plot. See matplotlib documentation for details (https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot).
-
prepare_markov_simulation
(verbose=False)¶ Pick up cells for Markov simulation.
- Parameters
verbose (bool) – If True, it plots selected cells.
-
run_markov_chain_simulation
(n_steps=500, n_duplication=5, seed=123, calculate_randomized=True)¶ Do Markov simlations to predict cell transition after perturbation. The transition probability between cells has been calculated based on simulated gene expression values in the signal propagation process. The cell state transition will be simulated based on the probability. You can simulate the process multiple times to get a robust outcome.
- Parameters
n_steps (int) – steps for Markov simulation. This value is equivalent to the amount of time after perturbation.
n_duplication (int) – the number for multiple calculations.
-
simulate_shift
(perturb_condition=None, GRN_unit=None, n_propagation=3, ignore_warning=False, use_randomized_GRN=False, clip_delta_X=False)¶ Simulate signal propagation with GRNs. Please see the CellOracle paper for details. This function simulates a gene expression pattern in the near future. Simulated values will be stored in anndata.layers: [“simulated_count”]
The simulation use three types of data. (1) GRN inference results (coef_matrix). (2) Perturb_condition: You can set arbitrary perturbation condition. (3) Gene expression matrix: The simulation starts from imputed gene expression data.
- Parameters
perturb_condition (dictionary) – condition for perturbation. if you want to simulate knockout for GeneX, please set [perturb_condition={“GeneX”: 0.0}] Although you can set any non-negative values for the gene condition, avoid setting biologically infeasible values for the perturb condition. It is strongly recommended to check gene expression values in your data before selecting the perturb condition.
GRN_unit (str) – GRN type. Please select either “whole” or “cluster”. See the documentation of “fit_GRN_for_simulation” for the detailed explanation.
n_propagation (int) – Calculation will be performed iteratively to simulate signal propagation in GRN. You can set the number of steps for this calculation. With a higher number, the results may recapitulate signal propagation for many genes. However, a higher number of propagation may cause more error/noise.
clip_delta_X (bool) – If simulated gene expression shift can lead to gene expression value that is outside of WT distribution, such gene expression is clipped to WT range.
-
suggest_mass_thresholds
(n_suggestion=12, s=1, n_col=4)¶
-
summarize_mc_results_by_cluster
(cluster_use, random=False)¶ This function summarizes the simulated cell state-transition by groping the results into each cluster. It returns sumarized results as a pandas.DataFrame.
- Parameters
cluster_use (str) – cluster information name in anndata.obs. You can use any arbitrary cluster information in anndata.obs.
-
to_hdf5
(file_path)¶ Save object as hdf5.
- Parameters
file_path (str) – file path to save file. Filename needs to end with ‘.celloracle.oracle’
-
updateTFinfo_dictionary
(TFdict={})¶ Update a TF dictionary. If a key in the new TF dictionary already exists in the old TF dictionary, old values will be replaced with a new one.
- Parameters
TFdict (dictionary) – Python dictionary of TF info.
-
update_cluster_colors
(palette)¶ Update color information stored in the oracle object. The color information is overwritten.
-
celloracle.
check_python_requirements
(return_detail=True, print_warning=True)¶ Check installation status and requirements of dependant libraries.
-
celloracle.
load_hdf5
(file_path, object_class_name=None)¶ Load an object of celloracle’s custom class that was saved as hdf5.
- Parameters
file_path (str) – file_path.
object_class_name (str) – Types of object. If it is None, object class will be identified from the extension of file_name. Default is None.
-
celloracle.
test_R_libraries_installation
(show_all_stdout=False)¶