Please acknowledge RLBase in your publications by citing the following reference:
Miller HE, Montemayor D, Li J, Levy SA, Pawar R, Hartono S, Sharma K, Frost B, Chedin D, Bishop AJR. Exploration and analysis of R-loop mapping data with RLBase. bioRxiv; doi:https://doi.org/10.1101/2021.11.01.466854.
RLSeq
is an R package for the downstream analysis of R-loop data sets. RLBase offers
in-browser access to the RLSeq analysis workflow. The workflow is described below:
RLRanges
object with
RLSeq documentation.
The resulting RLRanges
object, now containing all available results,
is then saved and uploaded to a public AWS S3 bucket.
Finally, the RLRanges
object
is then passed to the RLSeq::report()
function to generate an HTML report. The
report is also uploaded to an AWS S3 bucket along with all log files.
RLBase provides access to the raw and processed data sets which were generated
as part of the RLSuite project. With the exception of raw .bam
files, these
data are stored on the publicly-avialable RLBase-data AWS bucket (s3://rlbase-data/
).
For bulk access to RLBase-data (83.5 GB), please use AWS CLI:
# conda install -c conda-forge awscli
aws s3 sync --no-sign-request s3://rlbase-data/ rlbase_data/ # Downloads all RLBase-data
For fine-grained access to specific resources, please see the following guides:
All data in RLBase were processed using the RLPipes program (part of RLSuite). Peaks and coverage files were generated from genomic alignments, and the RLSeq analysis package (also part of RLSuite) was used to analyze the data and generate an HTML report. RLBase provides both bulk and fine-grained access to these data.
Data sets (below) can be downloaded in bulk using the AWS CLI.
.rds
files. They can be loaded with the readRDS()
function in R.aws s3 sync --no-sign-request s3://rlbase-data/rlranges/ rlranges/
RLSeq::report()
command).
See RLSeq.*.html
format.aws s3 sync --no-sign-request s3://rlbase-data/reports/ reports/
The full list of samples in RLBase and their corresponding download links are listed below:
Processed RData objects are provided via the RLHub R package (part of the RLSuite). A full description of the data is provided in the table below.
To access these data, there are several options:
RLHub (preferred)
RLHub
R package via remotes
(requires Bioconductor 3.14):if (!requireNamespace('BiocManager', quietly = TRUE))
install.packages('BiocManager', version='devel')
remotes::install_github('Bishop-Laboratory/RLHub')
gssignal <- RLHub::gs_signal()
Direct download
.rda
(RData) format and have a direct download link.annotations_primary_hg38
in R:tmp <- tempfile()
download.file('https://rlbase-data.s3.amazonaws.com/RLHub/annotations_primary_hg38.rda', destfile=tmp)
load(tmp)
AWS CLI:
# conda install -c conda-forge awscli
aws s3 sync --no-sign-request s3://rlbase-data/RLHub RLHub/ # Downloads the entire folder
Title | Description | Genome | RDataClass | Direct_Download | RLHub_Accessor |
---|---|---|---|---|---|
Primary Genomic Annotations (hg38) | Primary Human genomic annotations curated for use with RLSuite. | hg38 | list | Link | RLHub::annots_primary_hg38() |
Primary Genomic Annotations (mm10) | Primary Mouse genomic annotations curated for use with RLSuite. | mm10 | list | Link | RLHub::annots_primary_mm10() |
Full Genomic Annotations (hg38) | Full Human genomic annotations curated for use with RLSuite. | hg38 | list | Link | RLHub::annots_full_hg38() |
Full Genomic Annotations (mm10) | Full Mouse genomic annotations curated for use with RLSuite. | mm10 | list | Link | RLHub::annots_full_mm10() |
R-loop Binding Proteins | R-loop-binding proteins discovered from mass-spec studies. | hg38 | tbl | Link | RLHub::rlbps() |
Gene Expression | Gene expression count tables from matched RNA-Seq experiments corresponding to R-loop profiling. The counts, TPM, and VST-transformed counts are provided. | hg38 | SummarizedExperiment | Link | RLHub::gene_exp() |
Feature Enrichment per Sample | Genomic feature enrichment stats for each peakset in RLBase. | hg38 | tbl | Link | RLHub::feat_enrich_samples() |
Feature Enrichment per RL-Region | Genomic feature enrichment stats for the RL-Regions in RLBase. | hg38 | tbl | Link | RLHub::feat_enrich_rlregions() |
GS-Signal | Bin-level read counts for RLBase samples around R-loop sites discovered using long-read SMRF-Seq (gold-standard sites). | hg38 | tbl | Link | RLHub::gs_signal() |
FFT-Model | Stacked classifier for deciding whether samples successfully mapped R-loops. | hg38 | caretStack | Link | RLHub::fft_model() |
Feature-Prep Model | Model for transforming dataset features prior to classification. | hg38 | preProcess | Link | RLHub::prep_features() |
RLFS-Test Results | The results from RLFS (R-loop-forming sequences) analysis on all RLBase samples via the RLSeq package. | hg38 | list | Link | RLHub::rlfs_res() |
RLRegion Annotations | R-loop regions (rlregions) derived from S9.6-based and dRNH-based samples ('All' group), annotated with genomic features. | hg38 | tbl | Link | RLHub::rlregions_annot() |
RLRegion Metadata | R-loop regions (rlregions) derived from S9.6-based and dRNH-based samples ('All' group), with descriptive metadata. | hg38 | tbl | Link | RLHub::rlregions_meta() |
RLRegion Read Counts | Read count tables from RLBase samples quantified across the R-loop regions (rlregions) derived from both S9.6-based and dRNH-based samples ('All' group). | hg38 | SummarizedExperiment | Link | RLHub::rlregions_counts() |
RLBase Sample Manifest | The hand-curated manifest of all RLBase samples with descriptive metadata and some sample-level analysis results. | hg38 | tbl | Link | RLHub::rlbase_samples() |
The raw data was downloaded from SRA programmatically as part of the RLPipes processing pipeline. Raw reads were aligned to the genome using bwa-mem2 and uploaded to a publicly-accessible Box folder (1.5 TB).
Note: You will be unable to download the entire contents in bulk without a paid Box account. If you need to
access these *.bam
files in bulk, please simply follow the protocol outlined in
the RLBase-data repository
(link).
If you are unable to do so, please contact the RLBase maintainer
(Henry Miller) and he will assist you in accessing the data.
Miscellaneous data which provide support to RLBase and the other software in RLSuite are also available for download if desired.
aws s3 sync --no-sign-request s3://rlbase-data/rlfs-beds/ .
https://rlbase-data.s3.amazonaws.com/rlfs-beds/<UCSC_GENOME>.rlfs.bed
). Where UCSC_GENOME
is replaced by
the genome of interest. For example, 'hg38' would be https://rlbase-data.s3.amazonaws.com/rlfs-beds/hg38.rlfs.bed
.aws s3 sync --no-sign-request s3://rlbase-data/misc/model/ .
?RLHub:::models
.*.broadPeak
).?RLHub::annotations
.aws s3 sync --no-sign-request s3://rlbase-data/misc/cohesin_peaks/
.Note: Any other desired data will be provided upon reasonable request to the RLBase maintainer (Henry Miller).