This is a limited proof of concept to search for research data, not a production system.

Search the MIT Libraries

Title: PRJNA638224 - BCR repertoire sequencing from COVID-19 patients

Type Dataset Jacob D Galson (2020): PRJNA638224 - BCR repertoire sequencing from COVID-19 patients. Zenodo. Dataset. https://zenodo.org/record/3899008

Author: Jacob D Galson (Alchemab Therapeutics Ltd) ;

Links

Summary

Description

These are the processed BCR repertoire sequence data that accompany the following manuscript: “Deep sequencing of B cell receptor repertoires from COVID-19 patients reveals strong convergent immune signatures”. The manuscript preprint is available at doi: https://doi.org/10.1101/2020.05.20.106294. The raw sequence data are available on SRA under the BioProject PRJNA638224

 

Sequence processing

The Immcantation framework (docker container v3.0.0) was used for sequence processing. Briefly, paired-end reads were joined based on a minimum overlap of 20 nt, and a max error of 0.2, and reads with a mean phred score below 20 were removed. Primer regions, including UMIs and sample barcodes, were then identified within each read, and trimmed. Together, the sample barcode, UMI, and constant region primer were used to assign molecular groupings for each read. Within each grouping, usearch, was used to subdivide the grouping, with a cutoff of 80% nucleotide identity, to account for randomly overlapping UMIs. Each of the resulting groupings is assumed to represent reads arising from a single RNA. Reads within each grouping were then aligned, and a consensus sequence determined. For each processed sequence, IgBlast was used to determine V, D and J gene segments, and locations of the CDRs and FWRs. Isotype was determined based on comparison to germline constant region sequences. Sequences annotated as unproductive by IgBlast were removed.

 

Sequence data column description

sample_id         Unique identifier for each sequencing library sequence_id         Unique identifier for a sequence within a sample_id sequence_alignment         IMGT gapped nucleotide sequence germline_alignment         IMGT gapped germline sequence v_call         IGHV gene segment(s) and allele d_call         IGHD gene segment(s) and allele j_call         IGHJ gene segment(s) and allele c_call         Isotype subclass junction         Junction nucleotide sequence junction_aa         Junction amino acid sequence duplicate_count         UMI count for the given unique sequence consensus_count         Raw read count for the given unique sequence

 

Sequence metadata column description

sample_id         Unique identifier for each sequencing library bioproject_accession         NCBI BioProject accession number biosample_accession         NCBI BioSample accession number sra_accession         NCBI SRA accession number sex         Sex of patient age         Age of patient at time of sampling ethnicity         Ethnicity of patient health_state         One of worsening, stable, or improving

More information

  • DOI: 10.5281/zenodo.3899008
  • Language: en

Subjects

  • COVID-19, B cell repertoire, Antibody, SARS-CoV-2

Dates

  • Publication date: 2020
  • Issued: June 09, 2020

Rights


Much of the data past this point we don't have good examples of yet. Please share in #rdi slack if you have good examples for anything that appears below. Thanks!

Format

electronic resource

Relateditems

DescriptionItem typeRelationshipUri
IsCitedByhttps://doi.org/10.1101/2020.05.20.106294
IsVersionOfhttps://doi.org/10.5281/zenodo.3886394
IsPartOfhttps://zenodo.org/communities/airr
IsPartOfhttps://zenodo.org/communities/covid-19
IsPartOfhttps://zenodo.org/communities/zenodo