Title: roblanf/sarscov2phylo: 22-7-20
Type Software roblanf (2020): roblanf/sarscov2phylo: 22-7-20. Zenodo. Software. https://zenodo.org/record/3958884
Links
- Item record in Zenodo
- Digital object URL
Summary
The trees in this release were generated with the following command line:
bash global_tree_gisaid.sh -i gisaid_hcov-19_2020_07_22_07.fasta -o global.fa -t 34
The raw sequence file contains all available SARS-CoV-2 genomes in GISAID available on the 22nd of July 2020, at 9PM Canberra (Australia) time.
The ZIP file contains the code necessary to reproduce the trees themselves, and the README in the zip file also describes the methods used in detail. I also include the trees themselves here so that they can be easily downloaded without downloading the entire repo.
Filtering statistics
sequences downloaded from GISAID 44915 // alignment stats of global alignment Alignment number: 1 Format: aligned FASTA Number of sequences: 44446 Alignment length: 29903 Total # residues: 1326206221 Smallest: 29146 Largest: 29903 Average length: 29838.6 Average identity: 100% // alignment stats of global alignment after masking sites Alignment number: 1 Format: aligned FASTA Number of sequences: 44446 Alignment length: 29903 Total # residues: 1318812088 Smallest: 29059 Largest: 29680 Average length: 29672.2 Average identity: 100% // alignment stats after filtering out short/ambiguous sequences Alignment number: 1 Format: aligned FASTA Number of sequences: 44278 Alignment length: 29903 Total # residues: 1313831108 Smallest: 29059 Largest: 29680 Average length: 29672.3 Average identity: 100% // alignment stats of global alignment after trimming sites that are >50% gaps Alignment number: 1 Format: aligned FASTA Number of sequences: 44278 Alignment length: 29661 Total # residues: 1310443036 Smallest: 28457 Largest: 29661 Average length: 29595.8 Average identity: 100% // After filtering sequences with TreeShrink Type: Phylogram #nodes: 79266 #leaves: 44233 #dichotomies: 33504 #leaf labels: 44233 #inner labels: 35031Notable changes to the scripts in this release
NoneNotable aspects of the trees
A few long branches, particularly on sequences from India. These could be real or due to a lot of sequencing error. If real they would suggest that there are some highly diverged sequences in India. They should be treated with additional diligence compared to other sequences.More information
- DOI: 10.5281/zenodo.3958884
Dates
- Publication date: 2020
- Issued: July 24, 2020
Rights
- info:eu-repo/semantics/openAccess Open Access
Format
electronic resource
Relateditems
Description | Item type | Relationship | Uri |
---|---|---|---|
IsSupplementTo | https://github.com/roblanf/sarscov2phylo/tree/22-7-20 | ||
IsVersionOf | https://doi.org/10.5281/zenodo.3958883 | ||
IsPartOf | https://zenodo.org/communities/covid-19 | ||
IsPartOf | https://zenodo.org/communities/zenodo |