This is a limited proof of concept to search for research data, not a production system.

Search the MIT Libraries

Title: EukZoo, an aquatic protistan protein database for meta-omics studies.

Type Dataset Liu, Zhenfeng, Hu, Sarah, Caron, David (2018): EukZoo, an aquatic protistan protein database for meta-omics studies.. Zenodo. Dataset. https://zenodo.org/record/1476236

Authors: Liu, Zhenfeng (University of Southern California) ; Hu, Sarah (University of Southern California) ; Caron, David (University of Southern California) ;

Links

Summary

This database contain protein sequences of aquatic microbial eukaryotes, or protists. The purpose of this is to make a database that is of reasonable quality to serve as resource for both taxonomy and functional interpretation of metagenomic and metatranscriptomic studies of protists. The source of the sequences were mainly from Marine Microbial Eukaryotes Transcriptome Sequencing Project (MMETSP), and supplemented with various genomes and transcriptomes of organisms that were not a part of MMETSP.

To use this database, one has to understand the main function of the three files here.

(1) The protein sequences are stored in .faa file. You can build an alignment/search database out of that and search your meta-omics sequences against it. Each sequence in the FASTA file has an ID which always consists of two parts like this: "MMETSP0004_1234567". The text before the first underscore is the source ID of that sequence.

(2) Taxonomy information of each source ID are stored in "EukZoo_taxonomy_table_v_0.2.tsv". One can use the information within in conjunction with database search results to assign taxonomy to sequences.

(3) KEGG annotation of each sequence are stored in "EukZoo_KEGG_annotation_v_0.2.tsv". One can use the information within in conjunction with database search results to assign KEGG functional annotation (KO ID) to sequences.

I also provide scripts to assign taxonomy and KEGG annotation from database search results. You can also find the scripts and explanations on how to use them on the EukZoo GitHub page. You will find details on how the database was created and curated on there as well.

Please contact me at zhenfeng.liu1@gmail.com if you have any questions or requests. Thank you for your interest in EukZoo.

More information

  • DOI: 10.5281/zenodo.1476236

Subjects

  • protist, metatranscriptome, metagenome, protein database

Dates

  • Publication date: 2018
  • Issued: October 31, 2018

Rights


Much of the data past this point we don't have good examples of yet. Please share in #rdi slack if you have good examples for anything that appears below. Thanks!

Format

electronic resource

Relateditems

DescriptionItem typeRelationshipUri
IsVersionOfhttps://doi.org/10.5281/zenodo.1476235
IsPartOfhttps://zenodo.org/communities/zenodo