3. Mutation calling
In order to detect nucleotide mutations, aligned nucleotide sequences are compared with the reference nucleotide sequence, one nucleotide at a time. Mismatches between the query and reference sequences are then noted and reported differently, depending on their nature:
Nucleotide substitutions: a change from one character to another. For example a change from
Ain the reference sequence to
Gin the query sequence. They are shown in sequence views in Nextclade Web as colored markers, where color signifies the resulting character (in query sequence).
Nucleotide deletions (”gaps”): nucleotide was present in the reference sequence, but is not present in the query sequence. These are indicated by the “
-” character in the alignment sequence. They are shown in sequence views in Nextclade Web as dark-grey markers. In the output files deletions are represented as numeric ranges, signifying the start and end of the deleted fragment (for example:
Nucleotide insertions: additional nucleotides in the query sequence that were not present in the reference sequence. They are stripped from the alignment and reported in a separate output file, showing the position in the reference after which the insertion occurred and the fragment that was inserted.
22030:ACTwould indicate that the query sequence has the three bases
ACTinserted between position
22031in the reference sequence (the indices are 1-based).
Nextclade also gathers and reports other useful statistics, such as the number of contiguous ranges of
N (missing) and non-ACGTN (ambiguous) nucleotides, as well as the total counts of substituted, deleted, missing and ambiguous nucleotides. You can find this information in the results table of Nextclade Web and in the output files of Nextclade CLI.
Similarly, aminoacid mutations and statistics are gathered from the aligned peptides obtained after translation. This step only runs if a genome annotation is provided.
The nucleotide mutations can be viewed in “Sequence view” column of the results table in Nextclade Web. Switching “Sequence view” to a particular gene will show mutations in the corresponding peptide.
The mutation calling step results in a set of mutations and various practical metrics for each sequence. They are produced as a part of the analysis results JSON, CSV and TSV files in Nextclade CLI and in the “Download” dialog of Nextclade Web.