6. Clade assignment

To simplify discussion of co-circulating virus variants, viral diversity of is often broken down into Clades or lineages which are defined by specific combinations of signature mutations. Clades are groups of related sequences that share a common ancestor. For SARS-CoV-2, Nextclade can assign both broad clades defined by the Nextstrain team as well as more fine-grained lineages defined by the PANGO consortium.

Instead of directly using mutational signatures to assign clades, Nextclade assigns your sequences to clades by placing sequences on a phylogenetic tree annotated with clade definitions. More specifically, Nextclade assigns the clade of the nearest reference node found during the Phylogenetic placement step.

⚠️ Nextclade only considers those clades which are present in the input reference tree. Only one of these clades, and no others, can be assigned to the analyzed sequences. It is important to make sure that every clade that you expect to find in the results is well represented in the tree.


If unsure, use one of the trees from the default Nextclade datasets or any other well-known, up-to-date, sufficiently large and diverse tree.

💡 For regional, focused studies, it is recommended to use a tree which includes clades that are specific to your region.

SARS-CoV-2 specifics

For SARS-CoV-2, Nextstrain maintains one of the 3 major clade systems: Nextstrain clades. Besides assigning Nextstrain clades, Nextclade also assigns each sequence to a Pango lineage, another widely used clade system.

Nextstrain clades

The Nextstrain clade system is outlined in this blog post.

The clades are hierarchically structured as follows:

Hierarchy of clades of SARS-CoV-2 as defined by Nextstrain
Hierarchy of clades of SARS-CoV-2 as defined by Nextstrain [source] (click to enlarge)

You can find the exact, up-to-date clade definitions in github.com/nextstrain/ncov.

Pango lineages

Nextclade also assigns each sequence to a Pango lineage in the same way clades are assigned, reading off the lineage of the nearest neighbor in the reference tree.

You can read more about the method and validation results in this report. In short, for recent sequences (within last 12 months) Nextclade’s Pango lineage assignments are about as accurate as pangoLEARN’s. To keep the reference tree small, Nextclade does not include all early Pango lineages and Nextclade Pango lineages should thus be treated with caution for samples older than 12 months.

Results

Clades are reported in the “Clade” column in the results table of Nextclade Web as well as in the analysis results JSON, CSV and TSV files generated by Nextclade CLI and in the “Download” dialog of Nextclade Web.

For SARS-CoV-2, Pango lineages are also displayed in the results. In tsv and csv files, the column is named Nextclade_pango.