Title: Curation and ISA representation of a SARS-Cov2/Covid-19 Proteomics Dataset - PXD107710 - ISA representation
Type Dataset Philippe Rocca-Serra, Susanna Assunta Sansone (2020): Curation and ISA representation of a SARS-Cov2/Covid-19 Proteomics Dataset - PXD107710 - ISA representation. Zenodo. Dataset. https://zenodo.org/record/3742219
Links
- Item record in Zenodo
- Digital object URL
Summary
Curation and ISA representation of a SARS-Cov2/Covid-19 Proteomics Dataset deposited in PRIDE database with accession number: PXD107710
ISA-Tab annotation for the "SARS-CoV-2 infected host cell proteomics reveal potential therapy targets" publication.
Github repository: https://github.com/ISA-tools/PXD017710
This is part of an effort to (re-)annotate: https://dx.doi.org/10.21203/rs.3.rs-17218/v1
Additional work done as part of:
https://github.com/virtual-biohackathons/covid-19-bh20 https://github.com/virtual-biohackathons/covid-19-bh20/wiki/FairDataProteomics data
Available from PRIDE at https://www.ebi.ac.uk/pride/archive/projects/PXD017710 and [MassIVE/CCMS Maestro+MSstats reanalysis of MSV000085096 / PXD017710]
ISA-Tab representation:
Rationale: Demonstrate suitability of the ISA format for representing MS based protein profiling experiment with more granularity and details, thus providing a better representation of the experiment design. The formatting and re-annotation are based on information extracted from: - the original publication - the supplementary tables available from the publishers site - the 'filtered-results.csv' helper file as supplied to @sneumann during the HUPO-PSI meeting March 2020
Viewing the ISA-tab formatted and re-annotated PXD017710 with ISATab-Viewer
Viewing the ISA-tab formatted and re-annotated PXD017710 locally, do the following:
```bash python -m http.server 8000 ```
Then point your browser to `http://0.0.0.0:8000/isaviewer-demo.html`
Curation tasks performed:
* initial structure of the study design in ISA format:
* linkage of Proteome and Translatome data (supplementary material) to ISA assay tables (via Derived Data File)
* processing the Proteome and Translatome data (supplementary material) with python pandas library to generate the following csv files:
- proteome_intensities_long_table_ggplot2.txt - proteome_diffanal_ratio_pvalue_long_table_ggplot2.txt - translatome_intensities_long_table_ggplot2.txt - translatome_diffanal_ratio_pvalue_long_table_ggplot2 The files are `long table` corresponding to a `melt` on the Excel file originally generated by the users and can be readily loaded in R ggplot2 library for graphical representation. The statistical relevant elements have been annotated with the STATO ontology and the tables comply with a Frictionless.io Data Package. The jupyter notebook for the transformation is available.
* conversion of raw data to mzML format: detailed in https://github.com/ISA-tools/PXD017710
install docker: ```bash >brew update >brew install docker ```
sign in to docker ```bash >docker start >docker login ```
pull docker container for ProteoWizard: ```bash >docker pull chambm/pwiz-i-agree-to-the-vendor-licenses ```
:warning: be sure to sign-up and login to https://hub.docker.com/
in order to be able to reach
https://hub.docker.com/r/chambm/pwiz-skyline-i-agree-to-the-vendor-licenses
run the pwiz tool from the container over the raw data: ```bash docker run -it --rm -e WINEDEBUG=-all -v /Users/Downloads/PXD017710/raw/:/data chambm/pwiz-skyline-i-agree-to-the-vendor-licenses wine msconvert /data/*.raw --mzML ```
* ontology markup for: * declaration of independent variables as ISA Study Factors:{biological agent, dose, time point, replicate} ->OBI * Taxonomic information (host cells and virus) -> NCBITaxonomy * Cell line: CaCo-2 cells -> Cell Line Ontology * Disease: Colon Cancer -> Human Phenotype Ontology * MS specific aspect (TMT reagent, instrument ... ) -> PSI-MS * Statistical Tests -> STATO
Unresolved curatorial issues:
1. ambiguities related to Tandem Mass Tag labelling protocol - the publication mentions TMT11 (see Figure 2 in https://www.researchsquare.com/article/rs-17218/v1) - the information available from PRIDE mentions TMT6 (https://www.ebi.ac.uk/pride/archive/projects/PXD017710) This may require another round of annotation on the TMT agents and fractions in the ISA a_assay representation
2. SARS-Cov2 isolate: no clear NCBI Taxonomic anchoring and unclear origin: -> the markup is made to the parent class (as of 06.04.2020)
Release and packaging as a BDBAG:
The tgz file associated with this upload has been producing using https://github.com/fair-research/bdbag. It contains several manifest files detailing metadata and data files, providing md5 and sha256 checksums.
Github repository: https://github.com/ISA-tools/PXD017710
More information
- DOI: 10.5281/zenodo.3742219
- Language: en
Subjects
- FAIR data, Proteomics, mass spectrometry, SARS-Cov2, Covid-19, Caco2 cell line, treated versus control intervention design, ISA format, STATO ontology, bdbag, FAIRsharing
Dates
- Publication date: 2020
- Issued: April 06, 2020
Rights
- https://creativecommons.org/licenses/by/3.0/legalcode Creative Commons Attribution 3.0 Unported
- info:eu-repo/semantics/openAccess Open Access
Funding Information
| Awardnumber | Awarduri | Funderidentifier | Funderidentifiertype | Fundername | 
|---|---|---|---|---|
| 802750 | info:eu-repo/grantAgreement/EC/H2020/802750/ | 10.13039/100010661 | Crossref Funder ID | European Commission | 
Format
electronic resource
Relateditems
| Description | Item type | Relationship | Uri | 
|---|---|---|---|
| References | https://doi.org/10.21203/rs.3.rs-17218/v1 | ||
| Cites | https://www.ebi.ac.uk/pride/archive/projects/PXD017710 | ||
| IsVersionOf | https://doi.org/10.5281/zenodo.3742218 | ||
| IsPartOf | https://zenodo.org/communities/covid-19 | ||
| IsPartOf | https://zenodo.org/communities/zenodo |