Welcome to RLBase

A database for R-loops and R-loop mapping experiments



RLBase is free and open to all users and there is no login requirement.

Please acknowledge RLBase in your publications by citing the following reference:
Miller HE, Montemayor D, Li J, Levy SA, Pawar R, Hartono S, Sharma K, Frost B, Chedin D, Bishop AJR. Exploration and analysis of R-loop mapping data with RLBase. bioRxiv; doi:https://doi.org/10.1101/2021.11.01.466854.

RLBase Samples


The purpose of the 'RLBase Samples' page is the enable exploration of the 810 reprocessed and standardized R-loop mapping datasets profiled in our 2022 data mining study (see Miller et al., 2022 ; forthcoming in Nucleic Acids Research). See 'Documentation' for full usage details.
Table Controls

RLBase Samples Table

Sample summary

R-loop mapping modalities
Loading...

Sample labels
Loading...

Sample quality prediction
Loading...


Sample Heatmap

Sample PCA

Sample annotations
The Annotation panel provides the capability to observe the enrichment of R-loops within various genomic features. To learn about the genomic features present in this analysis, view the descriptions here . For each sample in RLBase, the called peaks were overlapped with each genomic feature annotation and overlap statistics were calculated using Fisher’s exact test. The plots show the distribution of Fisher’s exact test odds ratios for each sample present in the “RLBase Samples Table”. When a sample is selected in the 'RLBase Samples Table', the enrichment value for that sampel is displayed as a diamond on the plot.


R-loop forming sequences (RLFS) analysis results

Z-score distribution plot

Permutation test plot

Fourier transform plot


Overlap of sample peaks and RL Regions
This panel shows the overlap of consensus R-loop regions (RL regions) and the peaks of the selected sample. Of note: the Venn diagram overlap section shows the number of merged overlapping peaks. For example, if 2 RL Regions overlap 1 peak from the selected sample, they will all be counted together as 1 when merged. This means that the totals within the sections that include the green oval will be less than the total number of RL Regions (58,340).


R-loop regions in selected sample


Sample downloads

R-Loop Regions


R-loop regions (RL regions) are regions of the human genome which display robust R-loop formation, as described in our recent work ( Miller et al., 2022 ; Forthcoming in Nucleic Acids Research). The 'R-Loop Regions' page enables exploration of these sites and their association with gene expression. See 'Documentation' for full usage details.
Table Controls

RL Regions Table

RL Region Summary

RL Region expression correlation plot

Analyze R-loop data



Enter sample info
Privacy statement : Uploaded data and analysis will be posted on a publicly-accessible AWS S3 bucket and will NOT be kept private.



Running RLSeq


RLSeq is an R package for the downstream analysis of R-loop data sets. RLBase offers in-browser access to the RLSeq analysis workflow. The workflow is described below:

Format
Peaks are uploaded in broadPeak (preferred), narrowPeak, or BED format; preferrably called with MACS2/3 (see the example data). To generate peaks that conform to these standards, please see the RLPipes CLI tool. Ideally, peaks will be generated using MACS2 or MACS3 with default settings, but any other peak caller will also suffice as long as the peaks are in BED format. Of note, if a peak calling p-value is provided by the peak caller, peaks should be filtered to only contain significant entries (this is the default behavior in most peak callers). Furthermore, using an input control during peak calling will improve the accuracy of analysis results.
Analysis
RLSeq ingests the peaks and converts them to an RLRanges object with RLSeq documentation. The resulting RLRanges object, now containing all available results, is then saved and uploaded to a public AWS S3 bucket. Finally, the RLRanges object is then passed to the RLSeq::report() function to generate an HTML report. The report is also uploaded to an AWS S3 bucket along with all log files.

Example results: SRX1070676
Sharing: To share results, copy and send the results URL.

RLBase Downloads


RLBase provides access to the raw and processed data sets which were generated as part of the RLSuite project. With the exception of raw .bam files, these data are stored on the publicly-avialable RLBase-data AWS bucket (s3://rlbase-data/).

For bulk access to RLBase-data (83.5 GB), please use AWS CLI:

# conda install -c conda-forge awscli
aws s3 sync --no-sign-request s3://rlbase-data/ rlbase_data/  # Downloads all RLBase-data

For fine-grained access to specific resources, please see the following guides:

Processed data files

All data in RLBase were processed using the RLPipes program (part of RLSuite). Peaks and coverage files were generated from genomic alignments, and the RLSeq analysis package (also part of RLSuite) was used to analyze the data and generate an HTML report. RLBase provides both bulk and fine-grained access to these data.

Data details (and bulk download instructions)

Data sets (below) can be downloaded in bulk using the AWS CLI.

  • Peaks (6.7 GiB)
    • Peaks were called from genomic alignments (*.bam) using macs3. When available, an input control was used. See RLPipes.
    • Files are uncompressed, in .broadPeak (broadPeak) format.
    • AWS CLI: aws s3 sync --no-sign-request s3://rlbase-data/peaks/ peaks/
  • Coverage (66.0 GiB)
    • Coverage tracks were generated from genomic alignments (*.bam) with deepTools. See RLPipes.
    • Files are in .bw (bigWig) format.
    • AWS CLI: aws s3 sync --no-sign-request s3://rlbase-data/coverage/ coverage/
  • RLRanges (from RLSeq) (1.7 GiB)
    • The RLSeq analysis package was used to analyze the peak and coverage tracks to assess quality, genomic annotation enrichment, and other features of interest. The usage of RLSeq is found in the vignette here. See RLSeq.
    • The files are compressed .rds files. They can be loaded with the readRDS() function in R.
    • AWS CLI: aws s3 sync --no-sign-request s3://rlbase-data/rlranges/ rlranges/
  • RLSeq Reports (3.4 GiB)
    • The RLSeq analysis package also generates quality and analysis reports of samples analyzed with it. For each sample in RLBase, a report was generated (via the RLSeq::report() command). See RLSeq.
    • The files are in uncompressed *.html format.
    • AWS CLI: aws s3 sync --no-sign-request s3://rlbase-data/reports/ reports/
  • FASTQ Stats (92.8 MiB)
    • Quality statistics for the raw reads were generated via the fastp program (link). See RLPipes.
    • The files are in uncompressed *.json format.
    • AWS CLI: aws s3 sync --no-sign-request s3://rlbase-data/fastq_stats/ fastq_stats/
  • BAM Stats (411.9 KiB)
    • Quality statistics for the genomic alignments (*.bam files) were generated via the samtools program (link). See RLPipes.
    • The files are in uncompressed *.txt format.
    • AWS CLI: aws s3 sync --no-sign-request s3://rlbase-data/bam_stats/ bam_stats/
  • Quantified expression (417.5 MiB)
    • Expression samples were quantified via Salmon v1.5.2 (link). See RLPipes.
    • The files are in compressed archive (*.tar.xz) format. The archive contains the output of salmon as described in the Salmon documentation (link)
    • AWS CLI: aws s3 sync --no-sign-request s3://rlbase-data/quant/ quant/

The full list of samples in RLBase and their corresponding download links are listed below:


RData objects via RLHub

Processed RData objects are provided via the RLHub R package (part of the RLSuite). A full description of the data is provided in the table below.

Data Access Details

To access these data, there are several options:

  • RLHub (preferred)

    • Download the RLHub R package via remotes (requires Bioconductor 3.14):
    if (!requireNamespace('BiocManager', quietly = TRUE))
      install.packages('BiocManager', version='devel')
    
    remotes::install_github('Bishop-Laboratory/RLHub')
    
    • Access data using the functions shown in the table below. For example, to access 'GS-Signal':
    gssignal <- RLHub::gs_signal()
    

  • Direct download

    • All files are in .rda (RData) format and have a direct download link.
    • For example, to download and load annotations_primary_hg38 in R:
    tmp <- tempfile()
    download.file('https://rlbase-data.s3.amazonaws.com/RLHub/annotations_primary_hg38.rda', destfile=tmp)
    load(tmp)
    
  • AWS CLI:

    • Files can also be synced from AWS using the AWS CLI.
    • To download the entire RLHub bucket (399.6 MiB), for example:
    # conda install -c conda-forge awscli
    aws s3 sync --no-sign-request s3://rlbase-data/RLHub RLHub/  # Downloads the entire folder
    

Title Description Genome RDataClass Direct_Download RLHub_Accessor
Primary Genomic Annotations (hg38) Primary Human genomic annotations curated for use with RLSuite. hg38 list Link RLHub::annots_primary_hg38()
Primary Genomic Annotations (mm10) Primary Mouse genomic annotations curated for use with RLSuite. mm10 list Link RLHub::annots_primary_mm10()
Full Genomic Annotations (hg38) Full Human genomic annotations curated for use with RLSuite. hg38 list Link RLHub::annots_full_hg38()
Full Genomic Annotations (mm10) Full Mouse genomic annotations curated for use with RLSuite. mm10 list Link RLHub::annots_full_mm10()
R-loop Binding Proteins R-loop-binding proteins discovered from mass-spec studies. hg38 tbl Link RLHub::rlbps()
Gene Expression Gene expression count tables from matched RNA-Seq experiments corresponding to R-loop profiling. The counts, TPM, and VST-transformed counts are provided. hg38 SummarizedExperiment Link RLHub::gene_exp()
Feature Enrichment per Sample Genomic feature enrichment stats for each peakset in RLBase. hg38 tbl Link RLHub::feat_enrich_samples()
Feature Enrichment per RL-Region Genomic feature enrichment stats for the RL-Regions in RLBase. hg38 tbl Link RLHub::feat_enrich_rlregions()
GS-Signal Bin-level read counts for RLBase samples around R-loop sites discovered using long-read SMRF-Seq (gold-standard sites). hg38 tbl Link RLHub::gs_signal()
FFT-Model Stacked classifier for deciding whether samples successfully mapped R-loops. hg38 caretStack Link RLHub::fft_model()
Feature-Prep Model Model for transforming dataset features prior to classification. hg38 preProcess Link RLHub::prep_features()
RLFS-Test Results The results from RLFS (R-loop-forming sequences) analysis on all RLBase samples via the RLSeq package. hg38 list Link RLHub::rlfs_res()
RLRegion Annotations R-loop regions (rlregions) derived from S9.6-based and dRNH-based samples ('All' group), annotated with genomic features. hg38 tbl Link RLHub::rlregions_annot()
RLRegion Metadata R-loop regions (rlregions) derived from S9.6-based and dRNH-based samples ('All' group), with descriptive metadata. hg38 tbl Link RLHub::rlregions_meta()
RLRegion Read Counts Read count tables from RLBase samples quantified across the R-loop regions (rlregions) derived from both S9.6-based and dRNH-based samples ('All' group). hg38 SummarizedExperiment Link RLHub::rlregions_counts()
RLBase Sample Manifest The hand-curated manifest of all RLBase samples with descriptive metadata and some sample-level analysis results. hg38 tbl Link RLHub::rlbase_samples()

Raw data

The raw data was downloaded from SRA programmatically as part of the RLPipes processing pipeline. Raw reads were aligned to the genome using bwa-mem2 and uploaded to a publicly-accessible Box folder (1.5 TB).

Note: You will be unable to download the entire contents in bulk without a paid Box account. If you need to access these *.bam files in bulk, please simply follow the protocol outlined in the RLBase-data repository (link). If you are unable to do so, please contact the RLBase maintainer (Henry Miller) and he will assist you in accessing the data.

Other data

Miscellaneous data which provide support to RLBase and the other software in RLSuite are also available for download if desired.

  • R-loop forming sequences (RLFS)
    • R-loop forming sequenes were discovered for each genome that has gene annotations (48 in total; see available genomes) using the QmRLFS-finder program and converted to BED format.
    • They can be accessed in two main ways:
      • Bulk download: aws s3 sync --no-sign-request s3://rlbase-data/rlfs-beds/ .
      • Direct download of individual files (https://rlbase-data.s3.amazonaws.com/rlfs-beds/<UCSC_GENOME>.rlfs.bed). Where UCSC_GENOME is replaced by the genome of interest. For example, 'hg38' would be https://rlbase-data.s3.amazonaws.com/rlfs-beds/hg38.rlfs.bed.
  • Quality ML Models
    • These models are used by RLSeq to predict whether a sample robustly ('POS') or poorly ('NEG') maps R-loops. The full workflow by which they are generated is found in the RLBase-data repo.
    • Download all files in builk via aws s3 sync --no-sign-request s3://rlbase-data/misc/model/ .
    • Download RData models via RLHub (does not include HTML report). See ?RLHub:::models.
    • Model-building summary HTML report is available from direct download (link).
  • Cohesin peaks
    • Manually-curated STAG2 and STAG1 ChIP-Seq data reprocessed by the RLHub authors. They are the same STAG1 and STAG2 peaks described in Pan et al., 2020.
    • The file format is uncompressed broadPeak (*.broadPeak).
    • The processed form of these data is provided within RLHub. See ?RLHub::annotations.
    • The steps used for processing are provided in the RLBase-data repo.
    • BroadPeak files can be downloaded in bulk aws s3 sync --no-sign-request s3://rlbase-data/misc/cohesin_peaks/.

Note: Any other desired data will be provided upon reasonable request to the RLBase maintainer (Henry Miller).