Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal

Vendrami, David L. J.; Forcada, Jaume ORCID:; Hoffman, Joseph I.. 2019 Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal. BMC Genomics, 20 (1), 72.

Before downloading, please read NORA policies.
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.
Vendrami.pdf - Published Version
Available under License Creative Commons Attribution 4.0.

Download (1MB) | Preview


Background Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balance between the number of obtained RAD loci and depth of coverage, which is crucial for a successful outcome. To address this issue, PredRAD was recently developed, which uses probabilistic models to predict restriction site frequencies from a transcriptome assembly or other sequence resource based on either GC content or mono-, di- or trinucleotide composition. This program generates predictions that are broadly consistent with estimates of the true number of restriction sites obtained through in silico digestion of available reference genome assemblies. However, in practice the actual number of loci obtained could potentially differ as incomplete enzymatic digestion or patchy sequence coverage across the genome might lead to some loci not being represented in a RAD dataset, while erroneous assembly could potentially inflate the number of loci. To investigate this, we used genome and transcriptome assemblies together with RADseq data from the Antarctic fur seal (Arctocephalus gazella) to compare PredRAD predictions with empirical estimates of the number of loci obtained via in silico digestion and from de novo assemblies. Results PredRAD yielded consistently higher predicted numbers of restriction sites for the transcriptome assembly relative to the genome assembly. The trinucleotide and dinucleotide models also predicted higher frequencies than the mononucleotide or GC content models. Overall, the dinucleotide and trinucleotide models applied to the transcriptome and the genome assemblies respectively generated predictions that were closest to the number of restriction sites estimated by in silico digestion. Furthermore, the number of de novo assembled RAD loci mapping to restriction sites was similar to the expectation based on in silico digestion. Conclusions Our study reveals generally high concordance between PredRAD predictions and empirical estimates of the number of RAD loci. This further supports the utility of PredRAD, while also suggesting that it may be feasible to sequence and assemble the majority of RAD loci present in an organism’s genome.

Item Type: Publication - Article
Digital Object Identifier (DOI):
ISSN: 1471-2164
Additional Keywords: Restriction site associated DNA sequencing (RADseq, Restriction enzyme, Reference genome,PredRAD, Antarctic fur seal, Pinniped
Date made live: 29 Jan 2019 14:52 +0 (UTC)

Actions (login required)

View Item View Item

Document Downloads

Downloads for past 30 days

Downloads per month over past year

More statistics for this item...