This is a limited proof of concept to search for research data, not a production system.

Search the MIT Libraries

Title: ASTRAL-SCOPe subset 2.04 in ActivePapers format

Type Dataset Hinsen, Konrad (2014): ASTRAL-SCOPe subset 2.04 in ActivePapers format. Zenodo. Dataset. https://zenodo.org/record/11086

Author: Hinsen, Konrad (CNRS) ;

Links

Summary

This ActivePaper contains the structures in version 2.04 of the ASTRAL SCOPe subset with less than 40% sequence identity. For more information about ASTRAL and SCOPe, see

  http://scop.berkeley.edu/astral/

Each ASTRAL entry describes a domain from a protein structure in the PDB. This ActivePaper contains these domains in the MOSAIC HDF5 format. For more information about MOSAIC, see

  http://mosaic-data-model.github.io/

The structures are arranged by its SCOPe classification. For example, ASTRAL entry d1v0aa1 is found under /data/b/18/1/30/d1v0aa1, because SCOPe classifies it as

  b: all-beta proteins   b.18: Galactose-binding domain-like   b.18.1: Galactose-binding domain-like   b.18.1.30: CBM11

For each entry, the ASTRAL database provides a reference to the PDB entry with chain and residue identifiers plus a sequence. The importlet in /code/import_structures reads this information, downloads the corresponding PDB entry in mmCIF format, extracts the domain, checks that its sequence matches the one given by ASTRAL, and stores the domain in MOSAIC format.

For a small number of ASTRAL entries, this process failed for various reasons: mismatch between the ASTRAL reference and the PDB data, mismatch in the sequences, a mistake in the PDB mmCIF file, or unjustified hypotheses in the conversion script. The number of failures was deemed sufficiently small (28 failures out of 13042 entries) for not attempting a time-consuming in-detail analysis of each failure. The list of missing entries (generated automatically during the import process) can be found in this ActivePaper under /documentation/missing-entries.

More information

  • DOI: 10.5281/zenodo.11086

Subjects

  • ActivePapers, MOSAIC, protein structure

Dates

  • Publication date: 2014
  • Issued: July 30, 2014

Rights


Much of the data past this point we don't have good examples of yet. Please share in #rdi slack if you have good examples for anything that appears below. Thanks!

Format

electronic resource

Relateditems

DescriptionItem typeRelationshipUri
IsPartOfhttps://zenodo.org/communities/zenodo