Terminologyο
The terminology in bioinformatics is often ambiguous, with some terms not being defined well and some terms having different meaning, depending on context and research area.
In order to improve understanding of this documentation and of the source code of the project, in this section we try to summarize the terminology used by Nextclade, including possible synonyms. This terminology is not perfect or complete, and some of the definitions are purposefully simplified, to narrow down the scope to the topics relevant for the project.
For clarity, when possible, please use this vocabulary when communicating with Nextclade team.
We will be grateful for contributions to this section.
Reference sequenceο
Synonyms: Root sequence
The sequence against which the Alignment and Analysis are modelled.
Reference sequence is expected to be mostly complete (no or few unsequenced or missing regions) and unambiguous (no or few no ambiguous nucleotides) and is expected to correspond to the root node of the phylogenetic tree.
The quality of reference sequence is important for the quality of the analysis.
Root sequenceο
Same as Reference sequence.
The name originates from the Root node of the Reference tree (concept).
Query sequenceο
Synonyms: Query nucleotide sequence
One of the input nucleotide sequences provided by the user. These are the sequences to be analysed.
Reference nucleotideο
A nucleotide (character) in the Reference sequence.
Query nucleotideο
Synonyms: Derived nucleotide
A nucleotide (character) in the Query sequence.
Geneο
A nucleotide sequence fragment encoding a Peptide.
Codon (concept)ο
Synonyms: Triplet
A set of 3 consecutive nucleotides, encoding 1 aminoacid.
Codon (position)ο
Numeric index of the [Codon (concept)] in a Gene.
Peptideο
Synonyms: Aminoacid sequence
Translated nucleotide sequence of a Gene. A sequence consisting of aminoacids.
Query peptideο
A Peptide corresponding to one of the Genes in the Query sequence
Reference peptideο
A Peptide corresponding to one of the Genes in the Reference sequence
Query aminoacidο
Synonyms: Derived aminoacid
Aminoacid in the Query peptide
Reference aminoacidο
Aminoacid in the Query peptide
Reference tree (concept)ο
Phylogenetic tree - the tree diagram showing evolutionary relationships. Every node corresponds to a particular sequence. This tree is to be used as a source of clade annotations and as a target for phylogenetic placement.
Reference tree (file)ο
The file that encodes the Reference (phylogenetic) tree (concept). Most often refers to the tree files in Auspice JSON v2 format.
Reference nodeο
(not the same as Root node
The node of the original reference tree.
Before Plylogenetic placement all nodes of the tree are the reference nodes (there are no before New nodes yet).
Root nodeο
Root node of the reference tree. This is the parent node for all other nodes.
The root node corresponds to the Reference.
New nodeο
Node on the reference tree that corresponds to a particular Query sequence placed onto the tree during the Plylogenetic placement.
Gene mapο
(used interchangeably with Genome annotation)
A set of entries describing Genes for a particular virus. This includes names, nucleotide ranges of each gene.
Alignment (process)ο
(used interchangeably with Sequence alignment, Nucleotide alignment, Peptide alignment and Aminoacid alignment, depending on the surrounding context)
The process of arranging Query sequence against Reference sequence (or Query peptide against Reference peptide) to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.
During alignment, the fragments of the query sequence are compared to the fragments of the reference sequence, the similarities are identified and the fragments are repositioned such that to increase similarity. The resulting aligned sequences allow comparisons on nucleotide (or aminoacid) level and to perform further analysis for example deducing mutations and other features of practical interest).
(this definition is adapted with modifications from: wikipedia: Sequence alignment)
See Algorithm: phylogenetic placement for more details.
Alignment (result)ο
The Query sequence (or Query peptide) after the Alignment (process).
Alignment rangeο
Numeric range of nucleotide positions signifying begin and end of the aligned sequence.
Cladeο
A virus variant, typically one of a several co-circulating. in Nextstrain, clades are defined by their combination of signature mutations.
See also: Wikipedia: Clade
Phylogenetic placementο
The process of adding New nodes to the the Reference tree.
See Algorithm: phylogenetic placement for more details.
Analysisο
The process of performing various steps within the Nextclade algorithm.
Frameο
Reading frame [β¦]
Frame shiftο
[β¦]