Output files

This section describes the files produced by Nextclade.

You can download these files from Nextclade Web using the “Download” dialog.

Download output

Nextclade CLI writes these files into paths specified with a family of --output* flags.

All outputs

Nextclade CLI, Nextalign CLI flags: --output-all

All possible outputs can be produced using --output-all flag. The default base file name is either “nextalign” or “nextclade” depending on which tool you use. It can be changed using --output-basename flag. A list of outputs can be restricted using --output-selection flag.

Aligned nucleotide sequences

Nextclade CLI, Nextalign CLI flags: --output-fasta

Aligned sequences are produced as a result of the Sequence alignment step and are being output in FASTA format. If the CLI flag --include-reference is set, the reference sequence is included as the first entry.

Aligned peptides

Nextclade CLI, Nextalign CLI flags: --output-translations

Aligned peptides are produced as a result of the Translation and peptide alignment step and are being output in FASTA format. There are multiple files, one for each gene. If the CLI flag --include-reference is set, the reference sequence peptide is included as the first entry.

This flag accepts a template string which must contain template argument {gene}.

Analysis results

The results of mutation calling, clade assignment, quality control and PCR primer changes can be obtained in either TSV, CSV, or JSON format.

Tabular (CSV/TSV) results

Nextclade CLI flags: --output-csv, --output-tsv

TSV and CSV files are equivalent and only differ in the column delimiter (tabs vs semicolons), for better compatibility with spreadsheet software and data-science packages. Tabular format of TSV/CSV files are somewhat more human-friendly, are convenient for the immediate inspection and for simple automated processing.

Every row in tabular output corresponds to 1 input sequence. The meaning of columns is described below:

Column name Meaning
seqName Name of the sequence (as provided in the input file)
clade Assigned clade
qc.overallScore Overall quality control score
qc.overallStatus Overall quality control status
totalSubstitutions Total number of detected nucleotide substitutions
totalDeletions Total number of detected nucleotide deletions
totalInsertions Total number of detected nucleotide insertions
totalFrameShifts Total number of detected frame shifts
totalAminoacidSubstitutions Total number of detected aminoacid substitutions
totalAminoacidDeletions Total number of detected aminoacid deletions
totalAminoacidInsertions Total number of detected aminoacid insertions
totalMissing Total number of detected missing nucleotides
totalNonACGTNs Total number of detected ambiguous nucleotides
totalPcrPrimerChanges Total number of nucleotide mutations detected in PCR primer regions
substitutions List of detected nucleotide substitutions
deletions List of detected nucleotide deletion ranges
insertions List of detected inserted nucleotide fragments
privateNucMutations.reversionSubstitutions List of detected private mutations that are reversions to reference
privateNucMutations.labeledSubstitutions List of detected private mutations that are to a genotype that has been labeled in virus_properties.json
privateNucMutations.unlabeledSubstitutions List of detected private mutations that are neither reversions nor labeled
privateNucMutations.totalReversionSubstitutions Total number of private mutations that are reversions to reference
privateNucMutations.totalLabeledSubstitutions Total number of private mutations that are to a genotype that has been labeled in virus_properties.json
privateNucMutations.totalUnlabeledSubstitutions Total number of private mutations that are neither reversions nor labeled
privateNucMutations.totalPrivateSubstitutions Total number of private mutations overall
frameShifts List of detected frame shifts
aaSubstitutions List of detected aminoacid substitutions
aaDeletions List of detected aminoacid deletions
aaInsertions List of detected aminoacid insertions
missing List of detected nucleotide insertions
nonACGTNs List of detected ambiguous nucleotides
pcrPrimerChanges List of detected PCR primer changes
alignmentScore Alignment score
alignmentStart Beginning of the sequenced region
alignmentEnd End of the sequenced region
qc.missingData.missingDataThreshold Threshold that was used for "Missing data" QC rule
qc.missingData.score Score for "Missing data" QC rule
qc.missingData.status Status for "Missing data" QC rule
qc.missingData.totalMissing Total number of missing nucleotides used in "Missing data" QC rule
qc.mixedSites.mixedSitesThreshold Threshold used for "Mixed sites" QC rule
qc.mixedSites.score Score for "Mixed sites" QC rule
qc.mixedSites.status Status for "Mixed sites" QC rule
qc.mixedSites.totalMixedSites Total number of ambiguous nucleotides used for "Mixed sites" QC rule
qc.privateMutations.cutoff Cutoff parameter used for "Private mutations" QC rule
qc.privateMutations.excess Excess parameter used for "Private mutations" QC rule
qc.privateMutations.score Score for "Private mutations" QC rule
qc.privateMutations.status Status for "Private mutations" QC rule
qc.privateMutations.total Weighted sum of private mutations used for "Private mutations" QC rule
qc.snpClusters.clusteredSNPs Clustered SNP detected for "SNP clusters" QC rule
qc.snpClusters.score Score for "SNP clusters" QC rule
qc.snpClusters.status Status for "SNP clusters" QC rule
qc.snpClusters.totalSNPs Total number of SNPs for "SNP clusters" QC rule
qc.frameShifts.frameShifts List of detected frame shifts in "Frame shifts" QC rule (excluding ignored)
qc.frameShifts.totalFrameShifts Total number of detected frame shifts in for "Frame shifts" QC rule (excluding ignored)
qc.frameShifts.frameShiftsIgnored List of frame shifts detected, but ignored due to ignore list
qc.frameShifts.totalFrameShiftsIgnored Total number of frame shifts detected, but ignored due to ignore list
qc.frameShifts.score Score for "Frame shifts" QC rule
qc.frameShifts.status Status for "Frame shifts" QC rule
qc.stopCodons.stopCodons List of detected stop codons in "Stop codons" QC rule
qc.stopCodons.totalStopCodons Total number of detected stop codons in "Stop codons" QC rule
qc.stopCodons.score Score for "Stop codons" QC rule
qc.stopCodons.status Status for "Stop codons" QC rule
isReverseComplement Whether query sequences were transformed using reverse complement operation before alignment
errors List of errors during processing

The table can contain additional columns for every clade-like attributes defined in reference tree in meta.extensions.clade_node_attrs and in the node attributes. For example, the default SARS-CoV-2 datasets define Nextclade_pango attribute which signifies a PANGO lineage assigned by Nextclade (see Nextclade as pango lineage classifier: Methods and Validation).

JSON results

Nextclade CLI flag: --output-json, filename nextclade.json

JSON results file is best for in-depth automated processing of results. It contains everything tabular files contain, plus more, in a more machine-friendly format.

⚠️ Beware that JSON results use 0-indexed nucleotide and codon positions, whereas csv and tsv files use 1-indexed positions. The reason is, that JSON corresponds more closely to the internal representation and 0-indexing is the default in most programming languages. For example, substitution {refNuc: "C", pos: 2146, queryNuc: "T"} in JSON results corresponds to substitution C2147T in csv and tsv files.

Ranges are inclusive for the start and exclusive for the end. Hence, missing: {begin: 704, end: 726} in JSON results corresponds to missing: 705-726 in csv/tsv results.

Output phylogenetic tree

Nextclade CLI flags: --output-tree

Output phylogenetic tree. This is the input reference tree, with Query Sequences placed onto it.

Accepted formats: Auspice JSON v2 (description, schema) - this is the same format that is used in Nextstrain. And the same as for the input reference tree.

The tree can be viewed in auspice.us.

Stripped insertions

Nextclade CLI flag: --output-insertions

Nextclade strips insertions relative to the reference from aligned query sequences, so that they no longer appear in the output sequences. It outputs information about these insertions in CSV format.

The file contains the following columns (delimited by commas):

  • seqName - Name of the sequence, as in the input FASTA file

  • insertions - A string containing semicolon-separated insertions. Each insertion is in format <begin>:<seq>, where <begin> is the starting position of the insertion in the aligned sequence, <seq> is the nucleotide sequence fragment that was removed, e.g. "22204:GAGCCAGAA".

  • aaInsertions - String containing semicolon-separated insertions translated to aminoacids. Each insertion is in the format <gene>:<pos>:<seq>, e.g. "S:214:EPE".

List of errors and warnings

Nextclade CLI flag: --output-errors

A table that, for each sequence, contains a list of warnings, errors as well as a list of genes affected by error. The genes listed in this table are omitted from translation, analysis and FASTA outputs.