1. Transcription factor binding motif scan

In the previous section, we identified accessible promoter/enhancer DNA regions using ATAC-seq data. Next, we will construct the base GRN by scanning the regulatory genomic sequences for TF-binding motifs. The base GRN, potential TF-target gene connection list, will be used in the later GRN inference step.

The jupyter notebook files and data used in this tutorial are available here .

Scan DNA sequences searching for TF binding motifs

Python notebook

2. How to use custom motif data

CellOracle provides several default motif datasets. If you do not specify the motif data, CellOracle automatically loads the default motifs for your species. In most cases, you will not need to prepare your own TF binding motif dataset.

However, you do have the option to define or customize a motif dataset for your analysis.

gimmemotifs motif data

Here is the notebook demonstrating how to load a motif data from gimmemotifs database. https://github.com/morris-lab/CellOracle/blob/master/docs/notebooks/02_motif_scan/motif_data_preparation/01_How_to_load_gimmemotifs_motif_data.ipynb

CellOracle motif dataset generated from the CisBP version2 database

Here is the notebook describing how to load a motif data from CisBP version 2 database . https://github.com/morris-lab/CellOracle/blob/master/docs/notebooks/02_motif_scan/motif_data_preparation/02_How_to_load_CisBPv2_motif_data.ipynb

How to create custom motif data

We can also create new custom motif datasets defined by yourself. Here is an example notebook. https://github.com/morris-lab/CellOracle/blob/master/docs/notebooks/02_motif_scan/motif_data_preparation/03_How_to_make_custom_motif.ipynb