3.21.2
Fix: phylogenetic placement of sequences with large internal deletions
Sequences with large internal deletions were placed near the root of the reference tree instead of near their true closest relatives. The nearest node distance metric treated node mutations at deleted positions as if the query had the reference allele, making the root appear closest because it has no mutations. Deleted positions are now treated as uninformative (like N) in the distance calculation, so they no longer bias placement toward or away from any node. See #1765.
3.21.1
Fix: allow numeric keys in Auspice coloring scale
Auspice JSON v2 allows both string and numeric values as the first element of coloring scale entries. Nextclade previously only accepted strings, causing deserialization failures when loading Auspice JSONs with continuous colorings (e.g. nextstrain.org/rsv/a/genome/6y). Numeric scale keys are now accepted. See nextstrain/rsv#129, #1764 by @victorlin.
3.21.0
Docker: arm64 multi-platform images
Docker images are now built for both amd64 and arm64 architectures. Pulling any Nextclade Docker image on an ARM64 host (Apple Silicon, AWS Graviton, Raspberry Pi) now gets a native image instead of requiring emulation. See #1761, #1762.
Docker: multiple base image versions
Docker images are now published for multiple base image versions: 6 Alpine releases and 3 Debian releases, covering approximately 3 and 5 years respectively. Users can pin to a specific base version using tags like :alpine3.18 or :debian11. Unversioned tags (:alpine, :debian) point to the latest base version. See the full list of available tags and platforms on Docker Hub.
Docker: breaking changes
Default Debian base image changed from Debian 11 (Bullseye) to Debian 13 (Trixie). Users depending on Debian 11 can pin to
:debian11.Scratch image: binary moved from
/nextcladeto/usr/bin/nextclade. Usingnextcladewithout a path still works. Users referencing the absolute path/nextcladeshould update to/usr/bin/nextclade.
3.20.0
Nextclade Web: show genome coverage outline behind “too many markers” message
When viewing dense regions with many mutations, the sequence view shows a “too many markers” message. The genome coverage outline is now visible behind this message, providing context about sequence coverage even when individual markers cannot be displayed. See #1759, #1760 for details.
Nextclade schemas: typed dataset attributes field
Dataset attributes (attributes field in pathogen.json and dataset index) have been refactored from free-form key-value pairs to typed structs with explicit fields. The new structure provides better validation and documentation for dataset metadata fields like name, reference name, reference accession, and clade. The previous free-form attributes map is no longer supported.
Nextclade schemas: cleanup
Several deprecated and unused fields have been removed from the dataset schema:
Removed
enabledfield from datasets (was unused)Removed
officialfield from datasets (community datasets are now detected by path prefix)Removed deprecated and experimental fields from
VirusProperties:compatibility,shortcuts,files,defaultCds,cdsOrderPreference
These changes simplify the schema and remove legacy fields that were no longer used. Dataset authors should remove these fields from their pathogen.json files.
Nextclade schemas: documentation improvements
Added doc comments and examples to JSON schema types, improving auto-completion and inline documentation in editors that support JSON Schema. Types with new documentation include:
Dataset index types (
DatasetIndexJson,DatasetCollection,Dataset,DatasetVersion)Output types (
NextcladeOutputs,ResultsJson,NextcladeErrorOutputs)QC config and result types
Tree and Auspice extension types
Gene and annotation types
Mutation and alignment types
Pathogen config and phenotype types
Alignment parameter types
Nextclade documentation
Fixed
ref_nodesfield types and added missingqry[].namefieldAdded missing fields to pathogen config documentation
Marked
clade_node_attrs.displayNameas required per schemaCorrected
searchAlgoenum value fromancestor-latesttoancestor-nearestDocumented
aaMutLabelMap(previously a TODO placeholder)Fixed QC examples to use
cdsNameinstead ofgeneName
3.19.0
Update Auspice tree visualization to 2.67.0
Auspice tree visualization package has been updated from 2.59.1 to 2.67.0. See Auspice changelog here.
Nextclade Web: fix tree page stuck at loading screen
The tree visualization page could get stuck showing a loading spinner indefinitely after analysis completed. This has been fixed. See #1721 for details.
Nextclade Web: fix duplicate sequence warnings
The results table could show false “duplicate sequence” warnings during and after analysis due to duplicate-name tracking data being appended repeatedly on each incremental result update. Additionally, stale tracking data from a previous analysis run was not cleared when starting a new run. See #1734, #1744 for details.
Nextclade Web: fix auto-scroll hiding dataset search results
In the dataset selector, typing a search term caused the list to auto-scroll to the previously highlighted dataset, pushing filtered search results out of view. Auto-scroll now only triggers when the user explicitly selects a different dataset and no search term is active. See #1746, #1747 for details.
Nextclade Web: handle undefined dataset gracefully
When a stale or invalid dataset was stored in the browser’s local storage, navigating to the results, tree, or export pages could crash with an “Internal Error: Dataset not found” message. These pages now show a recovery UI with a link to return to the start page. The tree navigation link is also disabled until analysis has run. See #1745, #1748 for details.
Nextclade Web: show actual WASM panic messages
When the Rust code running inside WebAssembly encounters an internal error (panic), Nextclade Web now displays the actual error message with file location instead of the generic “null pointer passed to rust” message. The “copy error” button formats the error in a code block for direct pasting into GitHub issues. See #1749, #1750 for details.
Nextclade Web: fix missing row index on error and pending rows
Error and pending rows in the results table were missing the row index column, causing them to appear visually misaligned with successful result rows. See #1751 for details.
Nextclade Web: document Chrome Private Network Access restrictions
Starting with v141, Chromium-based browsers (Chrome, Edge, Brave, Opera) block requests from HTTPS origins (like https://clades.nextstrain.org) to localhost URLs due to Local Network Access security restrictions. This affects loading local files via URL parameters (e.g., ?dataset-server=http://localhost:3001). The documentation now explains the restriction and provides workarounds. See #1753 for details.
Fix integer overflow in alignment band area calculation
Fixed an integer overflow in the alignment band area calculation that could occur on 32-bit platforms (including WebAssembly in browsers) when aligning very long or highly divergent sequences. The band area now uses 64-bit integers, and additional overflow checks have been added to band dimension calculations. See #1749 for details.
Improve error message when alignment band area is exceeded
When the alignment band area limit is exceeded, Nextclade now displays a more informative error message. Large numbers are formatted for readability (e.g., “3.7B” instead of “3704350009”), and the message differentiates between likely causes based on the query-to-reference length ratio: concatenated sequences or assembly scaffolds when the query is much longer than the reference, or structural rearrangements or wrong reference when lengths are similar. See #1752 for details.
Nextclade CLI: automatic retry for network requests
Nextclade CLI now automatically retries transient network failures when downloading datasets. This improves reliability on flaky connections. Retryable conditions include network errors, 5xx server errors, 429 rate limiting, and 408 timeouts. The retry budget is scoped per host with a maximum of 3 retries per request. See #1723 for details.
Nextclade CLI: Mozilla CA certificates fallback for minimal containers
Nextclade CLI now bundles Mozilla’s root CA certificates as a fallback for environments where system certificates are not installed (e.g., minimal Docker containers like debian:stable or ubuntu:22.04 without the ca-certificates package). Previously, Nextclade CLI would fail with a “No CA certificates were loaded from the system” error in such environments. System certificates and user-provided certificates via --extra-ca-certs are still loaded when available and combined with the bundled Mozilla roots. See #726, #1723 for details.
Nextclade CLI: zstd compression for HTTP transfers
Nextclade CLI now accepts zstd content encoding for HTTP responses when downloading datasets. Zstd was already supported for file decompression; this extends it to the HTTP transfer layer, where it joins gzip, brotli, and deflate. See #1723 for details.
Major dependency upgrades
Nextclade has been upgraded to Rust 2024 edition, along with updates to the Rust and WebAssembly toolchains, Node.js, React, Next.js, and many other dependencies. See #1716, #1719 for details.
3.18.1
Fix: allow --output-annotation-gff and --output-annotation-tbl as sole output arguments
Nextclade CLI now correctly allows --output-annotation-gff and --output-annotation-tbl to be used as standalone output arguments without requiring other output files like --output-tsv or --output-all. Previously, using only these annotation output arguments would incorrectly trigger an error requiring at least one other output file argument to be specified. See #1707 for details. Thanks @ammaraziz for bug report.
3.18.0
Dataset version selector
Nextclade Web now includes a dataset version (tag) selector in single-dataset mode. This allows you to select a specific version of a dataset instead of always using the latest version. This is useful when you want to reproduce results using an older dataset version or when testing unreleased dataset updates.
Dataset collection badges
Dataset collection names are now displayed as colored badges in the dataset info section in Nextclade Web. These badges show which collection each dataset belongs to (e.g., “nextstrain” or “community” collections) and include optional icons for better visual distinction.
Customizable “Relative to” dropdown
Dataset authors can now customize the display names and descriptions for built-in entries in the “Relative to” dropdown (reference, parent, and clade founder nodes) in Nextclade Web. This allows datasets to provide more context-specific terminology for phylogenetic comparison targets. See #1683 for details.
Fix dataset loading and persistence issues
Numerous fixes have been made to improve dataset loading and persistence in Nextclade Web:
Fixed dataset persistence after page reload
Fixed loading single datasets from localhost servers using
?dataset-urlparameterFixed handling of
?dataset-serverURL parameter during initializationFixed handling of
?dataset-tagURL parameter: non-latest tags no longer cause “dataset not found” errorsFixed automatic updating of datasets when using the latest tag: Nextclade now picks up new versions automatically unless you explicitly select an older version
Improved detection of non-default dataset servers to ensure correct dataset persistence behavior
Fix “updated at” date-time formatting
Fixed “updated at” date-time formatting in Nextclade Web to display seconds consistently across all dataset tags
Fix dataset clearing when switching dataset servers
Dataset selection is now properly cleared when switching to a non-default dataset server
JSON schema additions
Added JSON schema definitions for Auspice Nextclade extensions (.meta.extensions.nextclade), enabling better validation and tooling support for dataset authors.
3.17.0
Add trailing semicolon to Newick format output
Newick format output now includes a trailing semicolon for better compatibility with parsers.
Remove extra quotation marks around filepaths
Nextclade would erroneously add double quotation marks when printing paths. This has been fixed.
Add more JSON schema definitions
We have added JSON schema definitions for Dataset and DatasetCollection types, both are parts of the DatasetIndexJson. You can find the latest definitions and documentation here
Add $schema field to JSON outputs
We have added an optional $schema field to JSON input and output files. This is a convention used to indicate which JSON schema definition the output conforms to. This can be useful for validation and for tooling that supports JSON schema (e.g. autocomplete and documentation in VSCode when editing pathogen.json file).
3.16.0
Introduce JSON Schema definitions
We now provide JSON Schema definitions for some of the JSON-based Nextclade file formats. You can find the latest definitions and documentation here. You can also generate schemas for your current version of Nextclade CLI using the newly added nextclade schema write command.
These schemas enable validation and parser code generation (among other things) to help in development and maintenance of projects using JSON-based Nextclade formats. We would like to emphasize that currently all JSON-based Nextclade formats are unstable and can change even in minor and patch releases. Addition of schemas don’t change these stability guarantees, but could help in reducing and fixing breakage after breaking changes.
Implement amino acid mutation labels
We added functionality to detect labeled amino acid mutations if dataset authors choose to define labels for them. This is similar to the existing labeled nucleotide mutations functionality, but for amino acid sequences. This could be useful to mark particular mutations of interest. Once datasets start adopting this functionality (by adding configuration to the mutLabels.aaMutLabelMap in pathogen.json of a dataset), you can find the results in the privateAaMutations.labeledSubstitutions column/field of Nextclade output files.
3.15.3
fix: ensure GFF3 fields are correctly percent-encoded and decoded
GFF3 specification requires certain characters in values and in attributes to be encoded using percent-encoding. Nextclade previously did not comply with this requirement, which could lead to incorrectly formatted annotation-related outputs (such as annotation field in JSON output or values in GFF3 output annotation). This has been fixed.
3.15.2
fix: calculate phenotypes even if there’s no tree
Nextclade now calculates and displays phenotype data even for datasets which have no reference tree. The clade ignoring feature does not apply for such datasets (because no clade assignment is possible without a tree).
fix: collapse -0.0 to 0.0 in phenotype values
Nextclade will no longer display or output negative zeroes in phenotype data.
fix: ensure CSV column and category selection is additive
Nextclade now correctly adds all entries together, when mixing individual columns and column categories in CSV and TSV output columns selection configuration (e.g. in --output-columns-selection CLI argument). Previously one would incorrectly overwrite the other.
fix: ensure canonical order of CSV columns
Order of columns in CSV and TSV is now always enforced to be the same, regardless of presence or absence of certain columns.
3.15.1
Nextclade Web: fix empty results table
Results table could sometimes be empty when starting Nextclade analysis from URL paramteres. This have been fixed.
3.15.0
Nextclade Web: fix country colorings in output tree json #1646
Colors for “country”, “region” and “division” are now correctly displayed in tree view
CDS coverage #1514
Nextclade now calculates amino acid coverage of each CDS. You can find this information:
in output files: in the column or field
cdsCoveragein Nextclade Web: in the tooltip of the “Cov.” column on “Results” page
3.14.5
Nextclade Web: disable “Relative to” dropdown when dataset has no reference tree
When using a dataset without reference tree for analysis, the “Relative to” dropdown has no meaning - the target for comparisons is always reference sequence node, because there is no information about any other nodes. We now disable this dropdown for datasets without trees.
3.14.4
Fix seed coverage calculation for short references
During alignment, seed coverage is now computed as the total seed length divided by the shorter of the query or reference. This ensures correct behavior when references are shorter than queries, without requiring artificially low min_seed_cover values.
Nextclade Web: fix crash when using ?multi-dataset URL parameter
When used ?multi-dataset URL parameter, Nextclade Web could crash under certain conditions. This has been fixed.
Nextclade Web: workaround double run
When using URL parameters, Nextclade could sometimes spawn multiple copies of the analysis run. This could result in duplicated sequences being reported in the results table erroneously as well as in other unwanted effects. We added a workaround to mitigate this problem. Please report bugs by submitting a GitHub issue.
3.14.3
Fix dataset suggestions and sorting for short references
Update minimizer-based scoring to better handle cases where the reference sequence is much shorter than the query sequence. The previous approach assumed full-genome references and could underestimate scores for partial references such as single genes. The revised method adjusts the normalization to avoid penalizing such cases, improving robustness without requiring changes to the index format. This resolves issues observed in datasets like yellow fever.
This improves dataset suggestions in Nextclade Web and dataset detection in nextclade sort CLI command. This also changes the scale of values of column score in the TSV output of nextclade sort command.
3.14.2
Nextclade Web: crash with custom Auspice JSON dataset
When an Auspice JSON dataset is provided as whole-dataset input through ?dataset-json-url, Nextclade Web could crash under certain conditions. This has been fixed.
3.14.1
Nextclade Web: crash with custom datasets
When a custom dataset is provided through ?dataset-url, Nextclade Web could crash under certain conditions. This has been fixed.
3.14.0
Nextclade Web: multi-dataset mode
Nextclade Web now allows to run analysis for multiple datasets at once.
You could provide sequences belonging to multiple organisms or for the dame organism, but based on different reference sequences. On “Multiple datasets” tab, Nextclade will try to deduce datasets that are best matching your sequences. You can then proceed to running analysis for each dataset. If multiple datasets have been detected, you will see a “Dataset” dropdown on “Results”, “Tree” and “Export” page, which allows you to switch between results for different datasets.
In multi-dataset mode, the “Export” page now also contains an “Export all to Excel” button, which allows to download .xlsx file containing all analysis results in tabular format, one dataset per sheet. This is the same data as in CSV/TSV files, but aggregated into a single file.
When starting Nextclade analysis using URL parameters, you can add ?multi-dataset to run in multi-dataset mode.
Nextclade is now using new global search algorithm to find the suggested datasets for your sequences. It tries to minimize the number of datasets, while optimizing their relevance.
This is a convenience feature, i.e. the analysis runs for each dataset are still independent, just like in single-dataset mode, except you don’t need to run multiple analyses for each dataset manually now.
This could be useful if you analyze one or multiple a FASTA files containing a mixture of sequences obtained from different organisms, strains or genome segments.
Nextclade Web: add “Focus on selected” toggle on “Tree” page
Adds a sidebar toggle on “Tree” page that emphasizes visible nodes by expanding them to occupy more vertical space, improving focus on filtered or zoomed subsets. Designed to enhance visibility in large phylogenetic trees. This is an Auspice feature which was introduced in Auspice 2.59.0 and now also available in Nextclade.
Nextclade CLI: global dataset search mode for sort command
You can now add --global to sort command to enable global search algorithm to find the minimal set of suggested datasets for your sequences. Note that this mode disables streaming of results, because the optimization step requires knowing datasets for all sequences in advance. This may lead to increased memory consumption for large inputs.
This is an experimental feature. Use with caution.
Nextclade CLI: fix panics
Some of the expected errors (e.g. invalid input files) in Nextclade CLI would previously cause panics (crashes). Now these errors are handled more gracefully and the visual output of these errors to the console is now cleaner and more concise.
Nextclade CLI: fix console color mode handling
Nextclade CLI previously output colored messages (with ANSI sequences) even if output is not a TTY (e.g. redirected to a file). This has now been fixed.
For additional configuration, the CLI arguments have been added, as well as proper handling of environment variables typically used to control console coloring.
The following priority rules apply:
Nextclade detects output target (TTY or not) and outputs appropriately for the target by default
If any of the environment variables:
COLOR(auto|always|never),NO_COLOR(set),CLICOLOR_FORCE=1are found, then they override the defaultIf arguments
--color=auto|always|neveror--no-color(shortcut for--color=never) are found, they override the defaults and environment variables. If multiple--coloror--no-colorarguments present, then only the argument that comes last is taken into account.
Known issue: --help coloring is not affected by --color and --no-color arguments: #1629)
3.13.3
Fix crash when exporting annotations for sequences with missing genes
Nextclade Web and CLI would crash when attempting to output GFF and TBL files where entire genes are unsequenced or otherwise missing. This has been fixed.
3.13.2
Speed up Nextclade web, fix crash when using files >45MB on Chromium v136 browsers
In recently released version 136, Chromium-based browsers (e.g. Chrome, Edge) reduced the maximum allowed fixed array size, causing Nextclade web to crash when files bigger than 46,505,915 bytes are used.
It turns out that avoiding the need for a large array gets rid of most of the delay between clicking “Run” and the start of the analysis. For files of ~60MB the time saved is on the order of 5 seconds. A small but noticeable performance win! See issue #1605 and PR #1606 for more details.
3.13.1
Fix crash on empty query annotations
For certain samples which end up with an empty output genome annotation Nextclade Web could crash. This is now resolved. See #1601,#1602. Thanks @theosanderson for reporting.
3.13.0
Output genome annotations
Nextclade now outputs genome annotations for each unaligned input sequence.
These annotations are derived from reference annotation (coming from a dataset or from --input-annotation), but with the ranges of genetic features (genes, CDSes) adjusted for unaligned input sequences. The word “unaligned” here refers to the input sequences being analyzed and before they are aligned, i.e. as they come in the input fasta file(s).
These annotations can serve as a starting point for submissions to genetic databases. They also allow to extract nucleotide sequences of genes and CDSes from unaligned sequences, if you need this. Note that the extraction from aligned sequences (as being output by Nextclade) can still be done using reference annotation.
Nextclade supports 2 formats for output annotations:
Genbank’s 5-column tab-delimited feature table (TBL) format (spec)
Generic Feature Format Version 3 (GFF3) (spec)
Both formats contain the same information, but GFF3 contains slightly more metadata due to this format being more flexible. Use one or the other, depending on your needs.
In Nextclade Web, these new output annotations can be downloaded on the “Export” page, in the nextclade.tbl and nextclade.gff sections.
In Nextclade CLI, if you are using --output-all the annotations are emitted into output directory as files nextclade.tbl and nextclade.gff. You can also add --output-annotation-tbl and/or --output-annotation-gff to override the path, or you can use only these parameters and omit --output-all, to emit only specified individual files (similar to all other --output-* parameters).
Please note that the annotations can only be output if there’s a reference annotation on the input (from a dataset or from --input-annotation).
This feature is still in an experimental phase. Please report bugs by submitting a GitHub issue.
3.12.0
Forbid reference sequences with gaps
Starting from this version, Nextclade won’t accept reference sequences (reference.fasta) which contain gap characters (-).
This is true for reference sequences in Nextclade datasets (provided as --input-dataset, --dataset-name in CLI, or ?input-dataset, ?dataset-name in web), Auspice JSON datasets (through --input-dataset-json or ?input-dataset-json) as well as for individually provided reference sequences, in absence of a dataset or when overriding its files (--input-ref, ?input-ref).
We could not find meaning for the gaps in reference sequences in the context of Nextclade (pairwise alignment and comparison of sequences to a reference). When a reference with gaps have been used, it could have been causing errors and even possibly produced incorrect results silently.
Starting from this version, Nextclade will now stop with an explicit error if it detects gaps in reference sequence. To resolve, please use a reference sequence without gaps, if possible, or notify dataset authors about the problem.
If you think that Nextclade needs to support reference sequences with gaps, please submit a new issue and explain your use-case and motivation on GitHub: https://github.com/nextstrain/nextclade/issues
3.11.0
Alignment presets
Nextclade CLI now supports --alignment-preset argument, which switches between pre-defined sets of alignment parameters. Currently available values are:
default: Suitable for aligning very similar sequences (this is the default)high-diversity: Suitable for more diverse virusesshort-sequences: Suitable for short and partial sequences
This is an experimental feature. Presets are subject to change.
Fix crash with empty reference sequence
Nextclade crashed when an empty reference sequence file is provided. Now Nextclade checks for this condition and reports a useful error message instead.
3.10.2
Correctly handle comments in GFF3 files
Nextclade sometimes reported an error in GFF3 files containing comments. This has been fixed now.
[cli] Fix verbosity CLI arguments
The Nextclade CLI arguments -v and -q were having no effect after a recent update. This has been fixed now.
3.10.1
[web] Fetch custom inputs from URLs using correct “Accept” HTTP header
Fixes Nextclade Web adding header Accept: application/json, text/plain, */* when making GET HTTP requests when fetching input files from use-provided URLs. This caused problems when fetching files from sources which allow to choose between different file formats using Accept header. The response would come in JSON format and this is not what we want. Now we add Accept: text/plain, */*, preferentially fetching all inputs as plaintext, as intended.
Nextclade 3.10.0
[web] Add links to open reference trees in nextstrain.org
You can now click on “Open tree” link in the dataset info box to open reference tree of this dataset on nextstrain.org. This allows to browse the current trees for each dataset without running Nextclade analysis. If a dataset does not provide a reference tree, the link will be disabled.
[web] Correctly disable “Load example” links
The “Load example” links are now correctly disabled, not hidden, for the datasets which do not provide example sequence data.
Nextclade 3.9.1
Fix: clade mismatch between placed node and parent node
This version addresses an issue when sometimes clade (or clade-like attribute, such as lineage) of the placed query node might not always match the clade of its parent.
The query node placement is adjusted during the greedy tree building, and sometimes the branch needs to be split and a new auxiliary internal node to be inserted to accommodate the new node. Previously, Nextclade would copy the clade of this internal node from the attachment target node. However, this is not always correct and can lead to mismatch between clade of the query node and of the new internal node.
In this version we added a voting mechanism, which calculates a mode of the clades involved: of the parent, target and query nodes. The same procedure is repeated for each clade-like attribute. After that, in some cases, branch labels also need to adjust their positions.
This should not change the clade assignment for query samples, but only the clades of the inserted auxiliary internal nodes, to make sure that the tree is consistent.
Nextclade 3.9.0
Nextclade CLI: Obtain CA certificates from platform trust store; add NEXTCLADE_EXTRA_CA_CERTS / --extra-ca-certs
Nextclade CLI users have previously reported issues with CA certificates when fetching datasets from an organization’s network (e.g. in a university or in a company).
Starting with this version, Nextclade CLI respects the OS-level trust store configurations. This includes private CAs and self-signed certificates. Ensures backward compatibility and functionality across different platforms, including those lacking a native trust store or with outdated ones.
We introduced a NEXTCLADE_EXTRA_CA_CERTS environment variable and --extra-ca-certs option which allow adding additional CA certificates to the trust store specifically for Nextclade, for when modifying the system’s trust store isn’t desirable/possible. See #1536 for more details.
Update Auspice tree visualization to 2.58.0
Auspice tree visualization package has been updated from 2.56.0 to 2.58.0. See Auspice changelog here.
Nextclade 3.8.2
Fix detection of number of threads Nextclade Web
Sometimes Nextclade Web would detect incorrect number of available CPU threads and would create too many processing threads for processing. This could cause additional overhead and slowdown the runs. We observed this behavior on non_chromium based browsers, such as Firefox and Safari. This has been fixed now. The number of threads has been clamped to 3 by default. You can modify this in “Settings” dialog.
Nextclade 3.8.1
Fix crash when using column config in Nextclade Web
Since 3.8.0 Nextclade could crash when particular combinations of CSV/TSV columns selected in “Column config” tab on “Export” page in Nextclade Web or with --output-columns-selection argument in Nextclade CLI. This has been resolved.
Remove extra spaces in ref node selector
Remove extra spaces in the text of entries in the “Relative to” dropdown selector in Nextclade Web.
Nextclade 3.8.0
Relative mutations
Nextclade now calls mutations relative to multiple targets. Additionally, to previously available mutations relative to reference and mutations relative to parent tree node (private mutations), Nextclade now calls mutations relative to clade founder tree nodes, and relative to custom nodes of interest if defined in the dataset (e.g. vaccine strains).
Nextclade Web now has an additional dropdown selector for the target of mutation calling. Output files has new columns/fields for mutations relative to clade founders (founderMuts) as well as for mutations relative to custom nodes (relativeMutations).
See documentation for more details.
Update Auspice tree visualization to 2.56.0
Auspice tree visualization package has been updated from 2.55.0 to 2.56.0. See Auspice changelog here.
Nextclade 3.7.4
Nextclade Web
Upgrade Auspice to 2.55.0, add polyfills
This definitively resolves crash due to missing JavaScript polyfills, which occurred in Nextclade Web 3.7.2
Nextclade 3.7.3
Nextclade Web
Fix crash on tree page in Nextclade Web
Temporarily downgrade Auspice from 2.55.0 to 2.54.3 to prevent the tree page in Nextclade Web from crashing. The definitive fix will follow.
Nextclade 3.7.2
General
[fix] Avoid duplicate node names in the output Auspice JSON tree
When multiple query samples were to be placed onto the same node on the reference tree, sometimes multiple auxiliary nodes could be created having the same name. Node names are expected to be unique for Auspice visualization to work correctly, so when visualizing the tree Auspice have been renaming these nodes and emitting warnings into browsers’ dev console.
In this version we pick unique names for the auxiliary nodes during placement, so that there are no more warnings. Users may observe changes in some of the node names when inspecting output Auspice JSON file. However, this unlikely to affect most users’ work.
Nextclade Web
[fix] Ensure dataset “updated at” date is displayed in Nextclade Web
Since 3.7.0 Nextclade Web is not showing “updated at” date for any datasets. This has been fixed.
[fix] Ensure frame shift and insertion markers in sequence views can also be toggled
Most markers can be toggled on or off on the sequence views in “Settings” page in Nextclade Web, however frame shifts and insertions could not be. We added the missing toggles.
[fix] Correctly style details/summary component
The text in details/summary (“collapse”, “spoiler”) component (e.g. the list of SC2 lineages) overflowing and producing garbled text in dataset readmes and changelogs. This has been fixed.
[dep] Update Auspice tree visualization to 2.55.0
Auspice tree visualization package has been updated from 2.53.0 to 2.55.0. See Auspice changelog here.
Internal
[infra] Fix feature-policy and permission-policy HTTP headers
The deprecated feature-policy header was removed entirely and interest-cohort entry was removed from the permission-policy header. Latest versions of web browsers should no longer emit warnings into console.
[test] Test Nextclade CLI on more Linx distros
Additionally to the previous, we now test Nextclade CLI on the following newer Linux distributions:
Amazon Linux 2.0.2024
Debian 12
Fedora 41
Oracle Linux 8.9
Ubuntu 24.04
Nextclade 3.7.1
Warn if reference sequence does not match root sequence of the tree
When both a standalone reference sequence and Auspice tree containing .root.sequence.nuc are present, Nextclade will check that these are the same sequence. If not, a warning is emitted to stderr for Nextclade CLI and to browser’s dev console for Nextclade Web. This is mostly useful for dataset authors, for debugging.
Fix error when selecting a CDS in genome annotation visualization in Nextclade Web
Nextclade sometimes displayed an error in the peptide view when switching CDS by clicking on annotation visualization. This has been fixed now.
Nextclade 3.7.0
Use Auspice JSON as a full dataset (experimental)
Nextclade can now optionally use Auspice datasets (in Auspice v2 JSON format) not only as reference trees, but also as self-contained full Nextclade datasets. Nextclade will take pathogen info, genome annotation, reference sequence, and, of course, reference tree from Auspice JSON. No other files are needed. This allows to use almost any Auspice dataset (e.g. from nextstrain.org) as Nextclade dataset.
In Nextclade CLI,
--input-datasetargument now also accepts a path to Auspice JSON file (in addition to accepting the usual paths to a dataset directory and zip archive)Nextclade Web now has a new URL parameter
dataset-json-url, which accepts a URL to Auspice JSON file or even to a dataset URL on nextstrain.org
This feature is currently in experimental stage. For details and discussion see PR #1455.
Make reference tree branch attributes optional
Nextclade now accepts Auspice JSONs without .branch_attrs on tree nodes.
Allow index and seqName in column selection
Previously, Nextclade treated output CSV/TSV columns index and seqName as mandatory and they were always present in the output files. In this release they are made configurable. One can:
in CLI: add or omit
indexandseqNamevalues when using--output-columns-selectionargumentin Web: tick or untick checkboxes for
indexandseqNamein “Column config” tab of “Export” page
Add dataset capabilities
The table in the nextclade dataset list command now displays an additional column “capabilities”, which lists dataset capabilities, i.e. whether dataset contains information allowing clade assignment, QC, etc. The same information is available in JSON format (unstable) if you pass --json flag.
Nextclade 3.6.0
Make reference tree node attribute clade_membership optional
Previously Nextclade required clade information to be always present in the input reference tree in the form of the .node_attrs.clade_membership field on each tree node. However, for certain datasets we might not have or need clade information. Making such datasets required workarounds, such as adding an empty string to the clade_membership field.
In this version we make clade_membership field optional. This allows to make datasets without clades. This is useful when working with organisms for which clades don’t make sense or for which the nomenclature is not sufficiently established. This is also useful for dataset authors, who can now bootstrap simple datasets without clades first and then add clades and other features gradually later.
With this change, if clade_membership is not present in the dataset’s reference tree nodes, then
Clade assignment will not run
Any clade-related functionality will not work
Output JSON/NDJSON result entries will not contain clade field
Clade column in output CSV/TSV will be empty
Clade column in Nextclade Web will be empty
This change does not affect any other parts of the application. Notably, clade-like attributes (from .meta.extensions.nextclade.clade_node_attrs), if present, are still assigned and being written to the output as before.
Nextclade 3.5.0
Algorithm
Detect loss of amino acid motifs correctly
Nextclade sometimes failed to detect a motif loss if that motif was the only one in its category. This is now fixed and users could observe changes in detected lost motifs. This affects datasets using aaMotifs property in their pathogen.json file, notably the flu datasets.
Nextclade Web
Ensure currently selected dataset is reloaded when it changes remotely
When dataset-url URL parameter is provided Nextclade Web would not update the dataset’s pathogen.json file when remote dataset changes without changing its version. This is now fixed. It only affected users providing custom datasets using dataset-url URL parameter.
General
Upgrade Auspice
The Auspice tree rendering package has been updated from version 2.52.1 to version 2.53.0. See the list of changes here
Nextclade 3.4.0
Nextclade Web
Remove redundant scrollbars in dataset names
In dataset selector, sometimes there were extra scrollbars displayed to the right of the dataset names. This has been fixed.
Select suggested dataset automatically when suggestion is triggered manually
When suggestion is triggered manually, using “Suggest” button on main page, Nextclade will automatically select the best dataset as the current dataset. Previously this could only be done by clearing the current dataset first and then clicking “Suggest”. When suggestion algorithm is triggered automatically, the behavior is unchanged - the dataset will not be selected.
Nextclade CLI
Don’t read dataset’s tree.json and genome_annotation.gff3 unless they are declared
Nextclade CLI will no longer read tree.json and genome_annotation.gff3 from the dataset, unless they are declared in the pathogen.json. These are optional files and we cannot assume their presence or filenames.
Warn user if input dataset contains extra files
Nextclade CLI will warn users when input datasets contains extra files which are not declared in the dataset’s pathogen.json, or if there’s extra declarations of files in the pathogen.json, but the files are not actually present in the dataset. This is mostly only useful to dataset authors for debugging issues in their datasets.
Add Bioconda Linux ARM build
We added one more build variant to Bioconda distribution channel - for Linux operating system on 64-bit ARM hardware architecture. It uses nextclade-aarch64-unknown-linux-gnu executable underneath. This can be useful if you prefer to manage Nextclade CLI installation on your Linux ARM machine or in a Docker ARM container with Conda package manager. However, because Nextclade CLI is a self-contained single-file executable, we still recommend direct downloads from GitHub Releases rather than Conda or other installation methods.
Nextclade 3.3.1
Fix crash when using --verbosity option
Nextclade was crashing with internal error when --verbosity option was present. This has been fixed.
Restrict Safari browser support to >= 16.5
Nextclade reports WebWorker-related errors when analysis is started on Safari browser. The minimum working version of Safari we were able to successfully test Nextclade on is 16.5. We still recommend using Chrome or Firefox for the best experience.
Nextclade 3.3.0
General
Allow FASTA files with leading newlines
Previously Nextclade would report an error “Expected character ‘>’ at record start” when input FASTA file contained trailing newline or when it was empty. This was fixed.
Nextclade Web
Upgrade Auspice tree renderer from version 2.51.0 to 2.52.1
See changelog here
Nextclade 3.2.1
Nextclade CLI
Fix “Dataset not found” error when using nextclade dataset get with --tag argument.
This fixes a bug in the dataset filtering logic causing “Dataset not found” error when even correct name and tag were requested using nextclade dataset get with --tag argument.
Nextclade 3.2.0
General
Minimizer search algorithm configuration has been improved
Minimizer search algorithm used in dataset auto-suggestion in Nextclade Web as well as in sort command of Nextclade CLI.
The default value for minimum match score (--min-score) has been reduced from 0.3 to 0.1. The default value for minimum number of hits (--min-hits) required for a detection has been reduced from 10 to 5. This should allow to better handle more diverse viruses.
If there is a sufficiently large gap between dataset scores, the algorithm will now only consider the group of datasets before the gap. The gap size can be configured using --max-score-gap argument in Nextclade CLI. The default value is 0.2.
Additionally, in Nextclade CLI sort command the algorithm now chooses only the best matching dataset. In order to select all matching datasets, the --all-matches flag has been added.
Nextclade CLI
Sequence index in the output TSV file of the sort command
The TSV output of the sort command (requested with --output-results-tsv) now contains additional column: index. The cells under this column contain index of the corresponding input sequence in the FASTA file. These indices can be used in the downstream processing to reliably map input sequences to the output results. Sequence names alone can be unreliable because they are arbitrary strings which are not guaranteed to be unique.
Nextclade 3.1.0
CLI
PCR primers are back in Nextclade CLI
Due to popular demand, we are bringing back --input-pcr-primers argument for Nextclade CLI, which accepts a path to primers.csv file. The feature works just like it did prior to release of Nextclade v3, except primers.csv is never read from a dataset - you always need to provide it separately. At the same time, we removed support for primers field from pathogen.json, because it was too difficult to make a correct JSON object and it would conflict with the primers provided with --input-pcr-primers.
Web
Fix results table stripes
Results table stripes are always alternating now, regardless of sorting and filtering applied. This is only a visual change and does not affect any functionality.
Nextclade 3.0.1
Bug fixes
Fixed a bug introduced in v3.0.0 which caused the default path for translations to be incorrect. This affected only users who used
--output-allwithout passing a custom path template via--output-translations. The new default path isnextclade.cds_translation.{cds}.fastawhere{cds}gets replaced with the name of the CDS, e.g.nextclade.cds_translation.S.fastafor SARS-CoV-2’s spike protein.Fixed a bug where
nextclade dataset getcommand fails to download a dataset if a dataset has more than one version released.
Documentation
Added a section to the v3 migration guide about the renamed default path for translations, a breaking change. The new default output path for translations is
nextclade.cds_translation.{cds}.fasta. Before v3, the default path wasnextclade_gene_{gene}.translation.fasta. You can emulate the old (default) behavior by passing--output-translations="nextclade_gene_{cds}.translation.fasta"tonextclade3.
Fix links
Fixed links on navigation bar: “Docs” and “CLI”
Nextclade 3.0.0
We are happy to present a major release of Nextclade, containing new features and bug fixes.
⚠️ This release contains breaking changes which may require your attention.
Useful links:
Nextclade Web v2 - if you need the old version, e.g. if you have custom v2 datasets
Nextclade CLI releases - all versions
Nextclade user documentation - for detailed instructions on how to use Nextclade Web and Nextclade CLI
Nextclade dataset curation guide - if you have a custom Nextclade dataset or want to create one
Nextclade source code repository - for contributors to Nextclade software (code, bug reports, feature requests etc.)
Nextclade data repository - for contributors to Nextclade datasets (add new datasets, update existing ones, report bugs, etc.)
Nextclade software issues - to report bugs and ask questions about Nextclade software
Nextclade data issues - to report bugs and ask questions about Nextclade datasets
Nextstrain discussion forum - for general discussion and questions about Nextstrain
BREAKING CHANGES
This section briefly lists breaking changes in Nextclade v3 compared to Nextclade v2. Please see Nextclade v3 migration guide (alternative link) for a detailed description of each breaking change and of possible migration paths.
Nextalign CLI is removed, because Nextclade CLI can now do everything that Nextalign v2 did
Potentially different alignment and translation output due to changes in the seed alignment algorithm. Some of the alignment parameters are removed. Default parameters of new parameters might need to be adjusted.
Potentially different tree output due to a new tree builder algorithm.
Dataset file format and dataset names have changed.
Some CLI arguments for individual input files are removed.
Some output files are removed
Genome annotation CLI argument is renamed
URL parameters in Nextclade Web have changed
CDS instead of genes
The sections below list all changes - breaking and non-breaking. The breaking changes are denoted with word [BREAKING].
If you encounter problems during migration, or breaking changes not mentioned in this document, please report it to the developers by opening a new GitHub issue.
General changes
[BREAKING] Alignment
The seed matching algorithm was rewritten to be more robust and handle sequences with higher diversity. For example, RSV-A can now be aligned against RSV-B.
Parameters minSeeds, seedLength, seedSpacing, minMatchRate, mismatchesAllowed, maxIndel no longer have any effect and are removed.
New parameters kmerLength, kmerDistance, minMatchLength, allowedMismatches, windowSize are added.
Default values should work for sequences with a diversity of up to X%. For sequences with higher diversity, the parameters may need to be adjusted.
For short sequences, the threshold length to use full-matrix alignment is now determined based on kmerLength instead of the removed seedLength. The coefficient is adjusted to roughly match the old final value.
Genome annotation
Replace genes with CDSes
Nextclade now treats genes only as containers for CDSes (“CDS” is coding sequence). CDSes are the main unit of translation and a basis for AA mutations now. A gene can contain multiple CDSes, but they are handled independently.
Handle fragmentation of genetic features
A CDS can consist of multiple fragments. These fragments are extracted from the full nucleotide genome independently and joined together (in the order provided in the genome annotation) to form the nucleotide sequence of the CDS. The CDS is then translated and the resulting polypeptides are analyzed (mutations are detected etc.). This implementation allows to handle slippage (e.g. ORF1ab in coronaviruses) and splicing (e.g. tat and rev in HIV-1).
Handle circular genomes
If genome annotation describes a CDS fragment as circular (wrapping around origin), Nextclade splits it into multiple linear (non-wrapping) fragments. The translation and analysis is then performed as if it was a linear genome.
Nextclade follows the GFF3 specification. Please refer to it for how to describe circular features.
Parse regions, genes and CDSes from GFF3 file
The GFF3 file parser has been augmented to support all the types of genetic features necessary for Nextclade to operate. There are still feature types which Nextclade ignores. We can consider supporting more types as scientific need arises.
Phylogenetic tree placement
Nextclade v3 now has the ability to phylogenetically resolve relationships between input sequences, where v2 would only attach each query sequence independently to the reference tree. Nextclade v3 thus may produce trees that are different from the trees produced in Nextclade v2.
Please read the Phylogenetic placement section in the documentation for more details.
Don’t count mutations to ambiguous nucleotides as reversions
We no longer treat mutations to ambiguous nucleotides as reversions, i.e. if the attachment node has a mutation mutated with respect to reference and the query sequence is ambiguous we previously counted this as a reversion. This change only affects “private mutation” QC score and the classification of private mutations into “reversion substitution” and “unlabeled substitution”.
Changes in Nextclade Web
Dataset autosuggestion
Nextclade Web can now optionally suggest the most appropriate dataset(s) for user-provided input sequences. Drop your sequences and click “Suggest” to try out this feature.
Genome annotation widget
Following changes in genome annotation handling, the genome annotations widget in Nextclade Web now shows CDS fragments instead of genes.
CDS selector widget in Nextclade Web
The gene selector dropdown in Nextclade Web’s results table has been transformed into a more general genetic feature selector. It shows the hierarchy of genetic features if there are nested features. Otherwise, the list is flat, to save screen space. It shows types of each of the genetic feature (gene, CDS or protein) as colorful badges. The menu is searchable, which is useful for mpox and other large viruses with many genes. Only CDSes can be selected currently, but we may extend this in the future to more feature types.
Show ambiguous nucleotides in sequence views
Nucleotide sequence views (in the results table) now also show colored markers for ambiguous nucleotides (non-ACTGN).
[BREAKING] Changed and removed some of the URL parameters
Due to changes in the dataset format and input files, the URL parameters have the following changes:
input-root-seqrenamed toinput-refinput-gene-maprenamed toinput-annotationinput-pathogen-jsonaddedinput-qc-configremovedinput-pcr-primersremovedinput-virus-propertiesremoveddataset-referenceremoved
[BREAKING] Removed some redundant output files
The nextclade.errors.csv and nextclade.insertions.csv files are removed and no longer appear in the “Export” dialog, nor are they included into the nextclade.zip archive of all outputs.
Errors and insertions are now included in the nextclade.csv and nextclade.tsv files.
Auspice updated from v2.45.2 to v2.51.0
The Auspice tree viewer component is updated from version 2.45.2 to 2.51.0. See the Auspice releases or changelog.
Changes in Nextclade CLI
[BREAKING] Nextalign CLI is removed
Nextalign CLI is no longer provided as a standalone application along with Nextclade CLI v3 because Nextclade now has all the features that distinguished Nextalign. This means there’s only one set of command line arguments to remember. Nextclade CLI runs the same algorithms, accepts same the inputs and provides the same outputs as v2 Nextalign, plus some more. For most use-cases, the CLI interface and the input and output files should be the same or very similar.
[BREAKING] Some alignment parameters are removed
Due to changes in the seed alignment algorithm, the following parameters are no longer used and the corresponding CLI arguments and JSON fields under alignmentParams in pathogen.json (previously virus_properties.json) were removed:
--seed-length
--seed-spacing
--max-indel
--min-match-rate
--min-seeds
--mismatches-allowed
The following new alignment parameters were added:
--allowed-mismatches
--kmer-distance
--kmer-length
--min-match-length
--min-seed-cover
--max-alignment-attempts
--max-band-area
--window-size
[BREAKING] Some CLI arguments for individual input files are removed
Due to changes in the dataset format the following CLI arguments were removed:
--input-virus-properties
--input-qc-config
--input-pcr-primers
in favor of --input-pathogen-json.
[BREAKING] Some CLI arguments for output files are removed
The arguments --output-errors and --output-insertions have been removed. Their information is now included in --output-csv and --output-tsv.
[BREAKING] Genome annotation CLI argument is renamed
The argument --input-gene-map renamed to --input-annotation. The short form -m remains unchanged.
[Breaking] Translation selection CLI argument is renamed
The argument --genes is renamed to --cds-selection. The short form -g remains unchanged.
Newick tree output
Nextclade can now also export the tree in Newick format via the --output-tree-nwk argument.
Optional input files
Most input files and files inside datasets are now optional. This simplifies dataset creation and maintenance and allows for step-by-step, incremental extension of them. You can start only with a reference sequence, which will only allow for alignment and very basic mutation calling in Nextclade, and later you can add more functionality. Optional input files also enable the removal of Nextalign CLI.
If you maintain a custom dataset or want to try creating one - refer to our Dataset curation guide. Community contributed datasets are welcome!
Added flag for disabling the new tree builder
The old phylogenetic tree placement behavior can be restored by adding the --without-greedy-tree-builder flag.
New arguments in dataset list command
The new argument --only-names allows to print a concise list of dataset names:
nextclade dataset list --only-names
The new argument --search allows to search datasets using substring match with dataset name, dataset friendly name, reference name or reference accession:
nextclade dataset list --search=flu
The argument --json allows to output a JSON object instead of the table. You can write it into a file and to postprocess it:
nextclade dataset list --json > "dataset_list.json"
nextclade dataset list --json | jq '.[] | select(.path | startswith("nextstrain/sars-cov-2")) | .attributes'
New subcommand: sort
The sort subcommand takes your sequences in FASTA format and outputs sequences grouped by dataset in the form of a directory tree. Each subdirectory corresponds to a dataset and contains an output FASTA file with only sequences that are detected to be similar to the reference sequence in this dataset.
Example usage:
nextclade sort --output-dir="out/sort/" --output-results-tsv="out/sort.tsv" "input.fasta"
This can be useful for splitting FASTA files containing sequences which belong to different pathogens, strains or segments, for example for separating flu HA and NA segments.
New subcommand: read-annotation
The read-annotation subcommand takes a GFF3 file and displays how features are arranged hierarchically as viewed by Nextclade. This is useful for Nextclade developers and dataset creators to verify (and debug) how Nextclade understand genetic features from a particular GFF3 file.
Example usage:
nextclade read-annotation genome_annotation.gff3
Type nextclade read-annotation --help for description of arguments.
Performance improvements
Nextclade web now twice as fast when processing many sequences
Nextclade Web now uses multithreading more effectively. This results in faster processing of large fastas on computers with more than one processor. The speedup is around 2 for 1000 SARS-CoV-2 sequences on a multi-core machine.
Internal changes
Ensure type safety across programming language boundaries
The new features caused changes in major internal data structures and made them more complex. We now generate JSON schema and Typescript typings from Rust code. This allows to find mismatches between parts written in different languages, and to avoid bugs related to data types.
Make positions and ranges in different coord spaces type-safe
The change in genome annotation handling had significant consequences for coordinate spaces Nextclade is using internally (e.g. alignment space vs reference space, nuc space vs aa space, global nuc space vs nuc space local to a CDS). In order to make coordinate transforms safer, we introduced new Position and Range types, different for each space. This prevents mixing up coordinates in different spaces.
Older versions
For changes in older versions, see docs/changes/CHANGELOG.old.md