This is a limited proof of concept to search for research data, not a production system.

Search the MIT Libraries

Title: transrate: v1.0.0 alpha 1

Type Software Richard Smith-Unna, cboursnell (2014): transrate: v1.0.0 alpha 1. Zenodo. Software. https://zenodo.org/record/12280

Authors: Richard Smith-Unna (University of Cambridge) ; cboursnell ;

Links

Summary

transrate v1.0.0 alpha 1

This is the first alpha release of transrate v1.

To install this pre-release, use the following command:

$ gem uninstall transrate $ gem install --pre transrate --version v1.0.0.alpha1 New features The Transrate score

The Transrate score is an estimate of the probability that the assembly is correct. A score is produced for the whole assembly, and for each contig. The scoring process uses the reads that were used to generate the assembly as evidence - so if you want to get a Transrate score, you need to run transrate in read-metrics mode (by passing in the reads with --left and --right).

The assembly score

The assembly score allows you to compare two or more assemblies made with the same reads. The score is designed so that an increased score is very likely to correspond to an assembly that is more biologically accurate.

The score is calculated as the geometric mean of all contig scores multiplied by the proportion of input reads that provide positive support for the assembly.

Thus, the score captures how confident you can be in what was assembled, as well as how complete the assembly is.

The contig score

Contig scores can be used to filter out bad contigs from an assembly, leaving you with only the well-assembled ones. Examining the distribution of contig scores can also give more detailed insight into the differences between assemblies.

Each contig is assigned a score by measuring how well it is supported by read evidence. The contig score can be thought of as an estimate of the probability that the contig is an accurate, non-redundant representation of a transcript that was present in the sequenced sample

There are five components to the contig score:

The probability that each base has been called correctly. This is estimated using the mean per-base edit distance, i.e. how many changes would have to be made to a read covering a base before the sequence of the read and the covered region of the contig agreed perfectly. The probability that each base is truly part of the transcript. This is estimated by determining whether any reads provide agreeing coverage for a base. The probability that each base is not contained in another contig. This is estimated by considering the root-mean-squared MAPQ score of the reads covering each base. The probability that the contig is derived from a single transcript (rather than pieces of two or more transcripts). This is estimated by assuming that fragments from different transcripts are likely to be generated at different rates, and that this difference is detectable as a difference in coverage distribution. The probability is then calculated using a bayesian sequence segmentation algorithm which models the coverage distribution as a Dirichlet distribution over a reduced set of finite coverage states. The probability that the contig is structurally complete and correct. This is estimated as the proportion of mapped read pairs that agree with the structure and composition of the contig, which in turn is calculated by classifying the read pair alignments.

The score is the product of the components.

The score components are useful independently of the contig score, as they can identify contigs that can be treated in different ways to improve the quality of an assembly.

Faster processing

We identified all the major bottlenecks in our code and rewrote large parts of the codebase in C++ to provide an ~20x speedup.

Faster alignment

We have moved to using the SNAP aligner for an ~20x speedup in read alignment.

Probabilistic assignment of multi-mapping reads

We have moved to using eXpress to select the most likely assignment for each multi-mapping read. This has led to a considerable increase in the usefulness of read-mapping metrics.

More information

  • DOI: 10.5281/zenodo.12280

Dates

  • Publication date: 2014
  • Issued: October 17, 2014

Rights

  • info:eu-repo/semantics/openAccess Open Access

Much of the data past this point we don't have good examples of yet. Please share in #rdi slack if you have good examples for anything that appears below. Thanks!

Format

electronic resource

Relateditems

DescriptionItem typeRelationshipUri
IsSupplementTohttps://github.com/Blahah/transrate/tree/v1.0.0.alpha.1
IsVersionOfhttps://doi.org/10.5281/zenodo.591478
IsPartOfhttps://zenodo.org/communities/zenodo