Curation and ISA represen...

Title: Curation and ISA representation of a SARS-Cov2/Covid-19 Proteomics Dataset - PXD107710 - ISA representation

Type Dataset Philippe Rocca-Serra, Susanna Assunta Sansone (2020): Curation and ISA representation of a SARS-Cov2/Covid-19 Proteomics Dataset - PXD107710 - ISA representation. Zenodo. Dataset. https://zenodo.org/record/3742219

Authors: Philippe Rocca-Serra (University of Oxford) ; Susanna Assunta Sansone (University of Oxford) ; Steffen Neumann (IPB Halle) ;

Summary

Curation and ISA representation of a SARS-Cov2/Covid-19 Proteomics Dataset deposited in PRIDE database with accession number: PXD107710

ISA-Tab annotation for the "SARS-CoV-2 infected host cell proteomics reveal potential therapy targets" publication.

Github repository: https://github.com/ISA-tools/PXD017710

This is part of an effort to (re-)annotate: https://dx.doi.org/10.21203/rs.3.rs-17218/v1

Additional work done as part of:

https://github.com/virtual-biohackathons/covid-19-bh20 https://github.com/virtual-biohackathons/covid-19-bh20/wiki/FairData

Proteomics data

Available from PRIDE at https://www.ebi.ac.uk/pride/archive/projects/PXD017710 and [MassIVE/CCMS Maestro+MSstats reanalysis of MSV000085096 / PXD017710]

ISA-Tab representation:

Rationale: Demonstrate suitability of the ISA format for representing MS based protein profiling experiment with more granularity and details, thus providing a better representation of the experiment design. The formatting and re-annotation are based on information extracted from: - the original publication - the supplementary tables available from the publishers site - the 'filtered-results.csv' helper file as supplied to @sneumann during the HUPO-PSI meeting March 2020

Viewing the ISA-tab formatted and re-annotated PXD017710 with ISATab-Viewer

Viewing the ISA-tab formatted and re-annotated PXD017710 locally, do the following:

```bash python -m http.server 8000 ```

Then point your browser to `http://0.0.0.0:8000/isaviewer-demo.html`

Curation tasks performed:

* initial structure of the study design in ISA format:

* linkage of Proteome and Translatome data (supplementary material) to ISA assay tables (via Derived Data File)

* processing the Proteome and Translatome data (supplementary material) with python pandas library to generate the following csv files:

- proteome_intensities_long_table_ggplot2.txt - proteome_diffanal_ratio_pvalue_long_table_ggplot2.txt - translatome_intensities_long_table_ggplot2.txt - translatome_diffanal_ratio_pvalue_long_table_ggplot2 The files are `long table` corresponding to a `melt` on the Excel file originally generated by the users and can be readily loaded in R ggplot2 library for graphical representation. The statistical relevant elements have been annotated with the STATO ontology and the tables comply with a Frictionless.io Data Package. The jupyter notebook for the transformation is available.

* conversion of raw data to mzML format: detailed in https://github.com/ISA-tools/PXD017710

install docker: ```bash >brew update >brew install docker ```

pull docker container for ProteoWizard: ```bash >docker pull chambm/pwiz-i-agree-to-the-vendor-licenses ```

:warning: be sure to sign-up and login to https://hub.docker.com/

in order to be able to reach

https://hub.docker.com/r/chambm/pwiz-skyline-i-agree-to-the-vendor-licenses

run the pwiz tool from the container over the raw data: ```bash docker run -it --rm -e WINEDEBUG=-all -v /Users/Downloads/PXD017710/raw/:/data chambm/pwiz-skyline-i-agree-to-the-vendor-licenses wine msconvert /data/*.raw --mzML ```

* ontology markup for: * declaration of independent variables as ISA Study Factors:{biological agent, dose, time point, replicate} ->OBI * Taxonomic information (host cells and virus) -> NCBITaxonomy * Cell line: CaCo-2 cells -> Cell Line Ontology * Disease: Colon Cancer -> Human Phenotype Ontology * MS specific aspect (TMT reagent, instrument ... ) -> PSI-MS * Statistical Tests -> STATO

Unresolved curatorial issues:

1. ambiguities related to Tandem Mass Tag labelling protocol - the publication mentions TMT11 (see Figure 2 in https://www.researchsquare.com/article/rs-17218/v1) - the information available from PRIDE mentions TMT6 (https://www.ebi.ac.uk/pride/archive/projects/PXD017710) This may require another round of annotation on the TMT agents and fractions in the ISA a_assay representation

2. SARS-Cov2 isolate: no clear NCBI Taxonomic anchoring and unclear origin: -> the markup is made to the parent class (as of 06.04.2020)

Release and packaging as a BDBAG:

The tgz file associated with this upload has been producing using https://github.com/fair-research/bdbag. It contains several manifest files detailing metadata and data files, providing md5 and sha256 checksums.

Github repository: https://github.com/ISA-tools/PXD017710

More information

DOI: 10.5281/zenodo.3742219
Language: en

Subjects

FAIR data, Proteomics, mass spectrometry, SARS-Cov2, Covid-19, Caco2 cell line, treated versus control intervention design, ISA format, STATO ontology, bdbag, FAIRsharing

Dates

Publication date: 2020
Issued: April 06, 2020

Rights

https://creativecommons.org/licenses/by/3.0/legalcode Creative Commons Attribution 3.0 Unported
info:eu-repo/semantics/openAccess Open Access

Funding Information

Awardnumber	Awarduri	Funderidentifier	Funderidentifiertype	Fundername
802750	info:eu-repo/grantAgreement/EC/H2020/802750/	10.13039/100010661	Crossref Funder ID	European Commission

Format

electronic resource

Relateditems

Description	Item type	Relationship	Uri
		References	https://doi.org/10.21203/rs.3.rs-17218/v1
		Cites	https://www.ebi.ac.uk/pride/archive/projects/PXD017710
		IsVersionOf	https://doi.org/10.5281/zenodo.3742218
		IsPartOf	https://zenodo.org/communities/covid-19
		IsPartOf	https://zenodo.org/communities/zenodo

This is a limited proof of concept to search for research data, not a production system.

MIT Libraries Homepage

Title: Curation and ISA representation of a SARS-Cov2/Covid-19 Proteomics Dataset - PXD107710 - ISA representation

Links

Summary

More information

Subjects

Dates

Rights

Funding Information

Format

Relateditems