In order to detect changes in viral proteins, aminoacid sequences (peptides) need to be computed from the nucleotide sequence regions corresponding to genes. This process is called translation. Protein sequences then need to be aligned, in order to make them comparable, similarly to how it’s done with nucleotide sequences.
Nextclade performs translation separately for every gene. Genes are specified in a genome annotation file, also called Gene map. In simple mode Nextclade Web uses the default gene map for each virus. In advanced mode Nextclade Web allows to supply a custom gene map. Nextclade CLI and Nextalign CLI allow to specify the gene map file or to omit it (in which case translation step does not run). The list of genes to be considered for translation is also configurable in Nextclade CLI and if it’s not specified, all genes found in the gene map are translated.
For each coding sequence in the gene map, Nextclade extracts the corresponding sequence from the nucleotide alignment, and then generates peptides by taking every triplet of nucleotides (codon) and translating it into a corresponding aminoacid. It then aligns the resulting peptides against the corresponding reference peptides (translated from reference sequence), using the same alignment algorithm as for nucleotide sequences.
This step only runs if the gene map is provided.
The translation step results in aligned Peptide sequences, which are being produced in the form of fasta files, one per gene.