Reference
This document contains the automatically generated reference documentation for command-line arguments of the latest version of Nextclade CLI.
If you have Nextclade CLI installed, you can type nextclade --help
to read the latest documentation for your installed version of Nextclade.
Command Overview:
nextclade
Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement.
Nextclade is a part of Nextstrain: https://nextstrain.org
Documentation: https://docs.nextstrain.org/projects/nextclade
Nextclade Web: https://clades.nextstrain.org
Publication: https://doi.org/10.21105/joss.03773
For short help type: nextclade -h
, for extended help type: nextclade --help
. Each subcommand has its own help, for example: nextclade run --help
.
Usage: nextclade [OPTIONS] <COMMAND>
Subcommands:
completions
— Generate shell completionsrun
— Run sequence analysis: alignment, mutation calling, clade assignment, quality checks and phylogenetic placementdataset
— List and download available Nextclade datasets (pathogens)sort
— Sort sequences according to the inferred Nextclade dataset (pathogen)read-annotation
— Read genome annotation and present it in Nextclade’s internal formats. This is mostly only useful for Nextclade maintainers and the most curious users. Note that these internal formats have no stability guarantees and can be changed at any time without noticehelp-markdown
— Print command-line reference documentation in Markdown format
Options:
--verbosity <VERBOSITY>
— Set verbosity level of console outputDefault value:
warn
Possible values:
off
,error
,warn
,info
,debug
,trace
--silent
— Disable all console output. Same as--verbosity=off
-v
,--verbose
— Make console output more verbose. Add multiple occurrences to increase verbosity further-q
,--quiet
— Make console output more quiet. Add multiple occurrences to make output even more quiet
nextclade completions
Generate shell completions.
This will print the completions file contents to the console. Refer to your shell’s documentation on how to install the completions.
Example for Ubuntu Linux:
nextclade completions bash > ~/.local/share/bash-completion/nextclade
Usage: nextclade completions [SHELL]
Arguments:
<SHELL>
— Name of the shell to generate appropriate completionsDefault value:
bash
Possible values:
bash
,elvish
,fish
,fig
,powershell
,zsh
nextclade run
Run sequence analysis: alignment, mutation calling, clade assignment, quality checks and phylogenetic placement
For short help type: nextclade -h
, for extended help type: nextclade --help
. Each subcommand has its own help, for example: nextclade run --help
.
Usage: nextclade run [OPTIONS] [INPUT_FASTAS]...
Arguments:
<INPUT_FASTAS>
— Path to one or multiple FASTA files with input sequencesSupports the following compression formats: “gz”, “bz2”, “xz”, “zst”. If no files provided, the plain fasta input is read from standard input (stdin).
Options:
Example: nextclade run -D dataset/ -O out/ seq1.fasta seq2.fasta
-D
,--input-dataset <INPUT_DATASET>
— Path to a directory or a zip file containing a dataset.See
nextclade dataset --help
on how to obtain datasets.If this flag is not provided, no dataset will be loaded and individual input files have to be provided instead. In this case
--input-ref
is required and--input-annotation,
–input-treeand
–input-pathogen-json` are optional.If both the
--input-dataset
and individual--input-*
flags are provided, each individual flag overrides the corresponding file in the dataset.Experimental feature: this argument also accepts a path to Auspice JSON file. In this case the files to be treated as a Nextclade dataset. This requires Auspice JSON file which contains
.root_sequence.nuc
field.Please refer to Nextclade documentation for more details about Nextclade datasets and their files.
-d
,--dataset-name <DATASET_NAME>
— Name of the dataset to download and use during the runThis is a convenience shortcut to first downloading a dataset and then immediately running with it. Providing this flag is equivalent to running 2 commands:
dataset get
followed byrun
, with the difference that the dataset files from the first command are not saved to disk and cannot be reused later. The default parameters are used for the dataset (e.g. default reference name and latest version tag).See
dataset get --help
anddataset list --help
for more details.Note that when using this flag, the dataset will be downloaded on every run. If a new version of the dataset is released between two runs, they will use different versions of the dataset and may produce different results. For the most reproducible runs, and for more control, use the usual 2-step flow with
dataset get
followed byrun
.This flag is mutually exclusive with
--input_dataset
-r
,--input-ref <INPUT_REF>
— Path to a FASTA file containing reference sequence. This file should contain exactly 1 sequence.Overrides path to
reference.fasta
in the dataset (--input-dataset
).Supports the following compression formats: “gz”, “bz2”, “xz”, “zst”. Use “-” to read uncompressed data from standard input (stdin).
-a
,--input-tree <INPUT_TREE>
— Path to Auspice JSON v2 file containing reference tree.See https://nextstrain.org/docs/bioinformatics/data-formats.
Overrides path to
tree.json
in the dataset (--input-dataset
).Supports the following compression formats: “gz”, “bz2”, “xz”, “zst”. Use “-” to read uncompressed data from standard input (stdin).
-p
,--input-pathogen-json <INPUT_PATHOGEN_JSON>
— Path to a JSON file containing configuration and data specific to a pathogen.Overrides path to
pathogen.json
in the dataset (--input-dataset
).Supports the following compression formats: “gz”, “bz2”, “xz”, “zst”. Use “-” to read uncompressed data from standard input (stdin).
-m
,--input-annotation <INPUT_ANNOTATION>
— Path to a file containing genome annotation in GFF3 format.Genome annotation is used to find coding regions. If not supplied, coding regions will not be translated, amino acid sequences will not be output, amino acid mutations will not be detected and nucleotide sequence alignment will not be informed by codon boundaries.
List of CDSes can be restricted using
--cds-selection
argument. Otherwise, all CDSes found in the genome annotation will be used.Overrides genome annotation provided by the dataset (
--input-dataset
or--dataset-name
).Learn more about Generic Feature Format Version 3 (GFF3): https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md
Supports the following compression formats: “gz”, “bz2”, “xz”, “zst”. Use “-” to read uncompressed data from standard input (stdin).
-g
,--cds-selection <CDS_SELECTION>
— Comma-separated list of names of coding sequences (CDSes) to use.This defines which peptides will be written into outputs, and which CDS will be taken into account during codon-aware alignment and aminoacid mutations detection. Must only contain CDS names present in the genome annotation.
If this flag is not supplied or its value is an empty string, then all CDSes found in the genome annotation will be used.
--input-pcr-primers <INPUT_PCR_PRIMERS>
— Path to a CSV file containing a list of custom PCR primer sites. This information is used to report mutations in these sites.Supports the following compression formats: “gz”, “bz2”, “xz”, “zstd”. Use “-” to read uncompressed data from standard input (stdin).
--server <SERVER>
— Use custom dataset server-O
,--output-all <OUTPUT_ALL>
— Produce all of the output files into this directory, using default basename and predefined suffixes and extensions. This is equivalent to specifying each of the individual--output-*
flags. Convenient when you want to receive all or most of output files into the same directory and don’t care about their filenames.Output files can be optionally included or excluded using
--output-selection
flag. The base filename can be set using--output-basename
flag.If both the
--output-all
and individual--output-*
flags are provided, each individual flag overrides the corresponding default output path.At least one of the output flags is required:
--output-all
,--output-fasta
,--output-ndjson
,--output-json
,--output-csv
,--output-tsv
,--output-tree
,--output-translations
.If the required directory tree does not exist, it will be created.
-n
,--output-basename <OUTPUT_BASENAME>
— Set the base filename to use for output files.By default the base filename is extracted from the input sequences file (provided with
--input-fasta
).Only valid together with
--output-all
flag.-s
,--output-selection <OUTPUT_SELECTION>
— Restricts outputs for--output-all
flag.Should contain a comma-separated list of names of output files to produce.
If ‘all’ is present in the list, then all other entries are ignored and all outputs are produced.
Only valid together with
--output-all
flag.Possible values:
all
,fasta
,json
,ndjson
,csv
,tsv
,tree
,tree-nwk
,translations
-o
,--output-fasta <OUTPUT_FASTA>
— Path to output FASTA file with aligned sequences.Takes precedence over paths configured with
--output-all
,--output-basename
and--output-selection
.If the provided file path ends with one of the supported extensions: “gz”, “bz2”, “xz”, “zst”, then the file will be written compressed. Use “-” to write the uncompressed to standard output (stdout).
If the required directory tree does not exist, it will be created.
-P
,--output-translations <OUTPUT_TRANSLATIONS>
— Template string for path to output fasta files containing translated and aligned peptides. A separate file will be generated for every gene.The string should contain template variable
{cds}
, where the gene name will be substituted. Make sure you properly quote and/or escape the curly braces, so that your shell, programming language or pipeline manager does not attempt to substitute the variables.Takes precedence over paths configured with
--output-all
,--output-basename
and--output-selection
.If the provided file path ends with one of the supported extensions: “gz”, “bz2”, “xz”, “zst”, then the file will be written compressed. Use “-” to write the uncompressed to standard output (stdout).
If the required directory tree does not exist, it will be created.
Example for bash shell:
–output-translations=’output_dir/nextclade.cds_translation.{cds}.fasta’
-N
,--output-ndjson <OUTPUT_NDJSON>
— Path to output Newline-delimited JSON (NDJSON) results file.This file format is most suitable for further machine processing of the results. By contrast to plain json, it can be streamed line-by line, so much bigger outputs are feasible.
Takes precedence over paths configured with
--output-all
,--output-basename
and--output-selection
.If the provided file path ends with one of the supported extensions: “gz”, “bz2”, “xz”, “zst”, then the file will be written compressed. Use “-” to write the uncompressed to standard output (stdout).
If the required directory tree does not exist, it will be created.
-J
,--output-json <OUTPUT_JSON>
— Path to output JSON results file.This file format is most suitable for further machine processing of the results.
Takes precedence over paths configured with
--output-all
,--output-basename
and--output-selection
.If the provided file path ends with one of the supported extensions: “gz”, “bz2”, “xz”, “zst”, then the file will be written compressed. Use “-” to write the uncompressed to standard output (stdout).
If the required directory tree does not exist, it will be created.
-c
,--output-csv <OUTPUT_CSV>
— Path to output CSV results file (delimiter: semicolon)This file format is most suitable for human inspection as well as for limited further machine processing of the results.
CSV and TSV output files are equivalent and only differ in the column delimiters.
Takes precedence over paths configured with
--output-all
,--output-basename
and--output-selection
.If the provided file path ends with one of the supported extensions: “gz”, “bz2”, “xz”, “zst”, then the file will be written compressed. Use “-” to write the uncompressed to standard output (stdout).
If the required directory tree does not exist, it will be created.
-t
,--output-tsv <OUTPUT_TSV>
— Path to output TSV results file (delimiter: tab)This file format is most suitable for human inspection as well as for limited further machine processing of the results.
CSV and TSV output files are equivalent and only differ in the column delimiters.
Takes precedence over paths configured with
--output-all
,--output-basename
and--output-selection
.If the provided file path ends with one of the supported extensions: “gz”, “bz2”, “xz”, “zst”, then the file will be written compressed. Use “-” to write the uncompressed to standard output (stdout).
If the required directory tree does not exist, it will be created.
-C
,--output-columns-selection <OUTPUT_COLUMNS_SELECTION>
— Restricts columns written into tabular output files (CSV and TSV).Should contain a comma-separated list of individual column names and/or column category names to include into both CSV and TSV outputs.
If this flag is omitted, or if category ‘all’ is present in the list, then all other entries are ignored and all columns are written.
Only valid together with one or multiple of flags:
--output-csv
,--output-tsv
,--output-all
.--output-graph <OUTPUT_GRAPH>
— Path to output phylogenetic graph with input sequences placed onto it, in Nextclade graph JSON format.Currently this format is not stable and not documented. It can change at any time without a warning. Use it at own risk.
Due to format limitations, it is only feasible to construct the tree for at most a few hundred to a few thousand sequences. If the tree is not needed, omitting this flag reduces processing time and memory consumption.
Takes precedence over paths configured with
--output-all
,--output-basename
and--output-selection
.If the provided file path ends with one of the supported extensions: “gz”, “bz2”, “xz”, “zst”, then the file will be written compressed. Use “-” to write the uncompressed to standard output (stdout).
If the required directory tree does not exist, it will be created.
-T
,--output-tree <OUTPUT_TREE>
— Path to output phylogenetic tree with input sequences placed onto it, in Auspice JSON V2 format.For file format description see: https://nextstrain.org/docs/bioinformatics/data-formats
Due to format limitations, it is only feasible to construct the tree for at most a few hundred to a few thousand sequences. If the tree is not needed, omitting this flag reduces processing time and memory consumption.
Takes precedence over paths configured with
--output-all
,--output-basename
and--output-selection
.If the provided file path ends with one of the supported extensions: “gz”, “bz2”, “xz”, “zst”, then the file will be written compressed. Use “-” to write the uncompressed to standard output (stdout).
If the required directory tree does not exist, it will be created.
--output-tree-nwk <OUTPUT_TREE_NWK>
— Path to output phylogenetic tree with input sequences placed onto it, in Newick format (New Hampshire tree format)For file format description see: https://en.wikipedia.org/wiki/Newick_format
Takes precedence over paths configured with
--output-all
,--output-basename
and--output-selection
.If the provided file path ends with one of the supported extensions: “gz”, “bz2”, “xz”, “zst”, then the file will be written compressed. Use “-” to write the uncompressed to standard output (stdout).
If the required directory tree does not exist, it will be created.
--include-reference <INCLUDE_REFERENCE>
— Whether to include aligned reference nucleotide sequence into output nucleotide sequence FASTA file and reference peptides into output peptide FASTA filesPossible values:
true
,false
--include-nearest-node-info <INCLUDE_NEAREST_NODE_INFO>
— Whether to include the list of nearest nodes to the outputsPossible values:
true
,false
--in-order <IN_ORDER>
— Emit output sequences in-order.With this flag the program will wait for results from the previous sequences to be written to the output files before writing the results of the next sequences, preserving the same order as in the input file. Due to variable sequence processing times, this might introduce unnecessary waiting times, but ensures that the resulting sequences are written in the same order as they occur in the inputs (except for sequences which have errors). By default, without this flag, processing might happen out of order, which is faster, due to the elimination of waiting, but might also lead to results written out of order - the order of results is not specified and depends on thread scheduling and processing times of individual sequences.
This option is only relevant when
--jobs
is greater than 1 or is omitted.Note: the sequences which trigger errors during processing will be omitted from outputs, regardless of this flag.
Possible values:
true
,false
--replace-unknown <REPLACE_UNKNOWN>
— Replace unknown nucleotide characters with ‘N’By default, the sequences containing unknown nucleotide characters are skipped with a warning - they are not analyzed and not included into results. If this flag is provided, then before the alignment, all unknown characters are replaced with ‘N’. This replacement allows to analyze these sequences.
The following characters are considered known: ‘-’, ‘A’, ‘B’, ‘C’, ‘D’, ‘G’, ‘H’, ‘K’, ‘M’, ‘N’, ‘R’, ‘S’, ‘T’, ‘V’, ‘W’, ‘Y’
Possible values:
true
,false
--without-greedy-tree-builder <WITHOUT_GREEDY_TREE_BUILDER>
— Disable greedy tree builder algorithmPossible values:
true
,false
--masked-muts-weight <MASKED_MUTS_WEIGHT>
--min-length <MIN_LENGTH>
— Minimum length of nucleotide sequence to consider for alignment.If a sequence is shorter than that, alignment will not be attempted and a warning will be emitted. When adjusting this parameter, note that alignment of short sequences can be unreliable.
--penalty-gap-extend <PENALTY_GAP_EXTEND>
— Penalty for extending a gap in alignment. If zero, all gaps regardless of length incur the same penalty--penalty-gap-open <PENALTY_GAP_OPEN>
— Penalty for opening of a gap in alignment. A higher penalty results in fewer gaps and more mismatches. Should be less than--penalty-gap-open-in-frame
to avoid gaps in genes--penalty-gap-open-in-frame <PENALTY_GAP_OPEN_IN_FRAME>
— As--penalty-gap-open
, but for opening gaps at the beginning of a codon. Should be greater than--penalty-gap-open
and less than--penalty-gap-open-out-of-frame
, to avoid gaps in genes, but favor gaps that align with codons--penalty-gap-open-out-of-frame <PENALTY_GAP_OPEN_OUT_OF_FRAME>
— As--penalty-gap-open
, but for opening gaps in the body of a codon. Should be greater than--penalty-gap-open-in-frame
to favor gaps that align with codons--penalty-mismatch <PENALTY_MISMATCH>
— Penalty for aligned nucleotides or amino acids that differ in state during alignment. Note that this is redundantly parameterized with--score-match
--score-match <SCORE_MATCH>
— Score for matching states in nucleotide or amino acid alignments--max-band-area <MAX_BAND_AREA>
— Maximum area of the band in the alignment matrix. Alignments with large bands are slow to compute and require substantial memory. Alignment of sequences requiring bands with area larger than this value, will not be attempted and a warning will be emitted--retry-reverse-complement <RETRY_REVERSE_COMPLEMENT>
— Retry seed matching step with a reverse complement if the first attempt failedPossible values:
true
,false
--no-translate-past-stop <NO_TRANSLATE_PAST_STOP>
— If this flag is present, the amino acid sequences will be truncated at the first stop codon, if mutations or sequencing errors cause premature stop codons to be present. No amino acid mutations in the truncated region will be recordedPossible values:
true
,false
--excess-bandwidth <EXCESS_BANDWIDTH>
— Excess bandwidth for internal stripes--terminal-bandwidth <TERMINAL_BANDWIDTH>
— Excess bandwidth for terminal stripes--gap-alignment-side <GAP_ALIGNMENT_SIDE>
— Whether to align gaps on the left or right side if equally parsimonious. Default: leftPossible values:
left
,right
--kmer-length <KMER_LENGTH>
— Length of exactly matching k-mers used in the seed alignment of the query to the reference--kmer-distance <KMER_DISTANCE>
— Interval of successive k-mers on the query sequence. Should be small compared to the query length--allowed-mismatches <ALLOWED_MISMATCHES>
— Exactly matching k-mers are extended to the left and right until more thanallowed_mismatches
are observed in a sliding window (window_size
)--window-size <WINDOW_SIZE>
— Size of the window within which mismatches are accumulated during seed extension--min-match-length <MIN_MATCH_LENGTH>
— Minimum length of extended k-mers--min-seed-cover <MIN_SEED_COVER>
— Fraction of the query sequence that has to be covered by extended seeds to proceed with the banded alignment--max-alignment-attempts <MAX_ALIGNMENT_ATTEMPTS>
— Number of times Nextclade will retry alignment with more relaxed results if alignment band boundaries are hit-j
,--jobs <JOBS>
— Number of processing jobs. If not specified, all available CPU threads will be used
nextclade dataset
List and download available Nextclade datasets (pathogens)
For short help type: nextclade -h
, for extended help type: nextclade --help
. Each subcommand has its own help, for example: nextclade dataset --help
.
Usage: nextclade dataset <COMMAND>
Subcommands:
list
— List available Nextclade datasetsget
— Download available Nextclade datasets
nextclade dataset list
List available Nextclade datasets
For short help type: nextclade -h
, for extended help type: nextclade --help
. Each subcommand has its own help, for example: nextclade run --help
.
Usage: nextclade dataset list [OPTIONS]
Options:
-n
,--name <NAME>
— Restrict list to datasets with this exact name.Can be used to test if a dataset exists.
Mutually exclusive with –search
-s
,--search <SEARCH>
— Search datasets by name or by reference.Will only display datasets containing this substring in their name (path), or either of attributes: “name”, “reference name”, “reference accession”.
Mutually exclusive with –name
-t
,--tag <TAG>
— Restrict list to datasets with this exact version tag--include-incompatible
— Include dataset versions that are incompatible with this version of Nextclade CLI--include-deprecated
— Include deprecated datasets.Authors can mark a dataset as deprecated to express that the dataset will no longer be updated and/or supported. Reach out to dataset authors for concrete details.
--no-experimental
— Exclude experimental datasets.Authors can mark a dataset as experimental when development of the dataset is still in progress, or if the dataset is incomplete or of lower quality than usual. Use at own risk. Reach out to dataset authors if interested in further development and stabilizing of a particular dataset, and consider contributing.
--no-community
— Exclude community datasets and only show official datasets.Community datasets are the datasets provided by the members of the broader Nextclade community. These datasets may vary in quality and completeness. Depending on authors’ goals, these datasets may be created for specific purposes, rather than for general use. Nextclade team is unable to verify correctness of these datasets and does not provide support for them. For all questions regarding a concrete community dataset, please read its documentation and reach out to its authors.
--json
— Print output in JSON format.This is useful for automated processing. However, at this time, we cannot guarantee stability of the format. Use at own risk.
--only-names
— Print only names of the datasets, without any other details--server <SERVER>
— Use custom dataset server.You can host your own dataset server, with one or more datasets, grouped into dataset collections, and use this server to provide datasets to users of Nextclade CLI and Nextclade Web. Refer to Nextclade dataset documentation for more details.
-x
,--proxy <PROXY>
— Pass all traffic over proxy server. HTTP, HTTPS, and SOCKS5 proxies are supported--proxy-user <PROXY_USER>
— Username for basic authentication on proxy server, if applicable. Only valid when--proxy
is also supplied.--proxy-user
and--proxy-pass
must be either both specified or both omitted--proxy-pass <PROXY_PASS>
— Password for basic authentication on proxy server, if applicable. Only valid when--proxy
is also supplied.--proxy-user
and--proxy-pass
must be either both specified or both omitted--extra-ca-certs <EXTRA_CA_CERTS>
— Path to extra CA certificates as a PEM bundle.You can also provide the path to CA certificates in the environment variable
NEXTCLADE_EXTRA_CA_CERTS
. The argument takes precedence over the environment variable if both are provided.Default CA certificates are those obtained from the platform/OS-level trust store plus those from a baked-in copy of Mozilla’s common CA trust store. You can override the certs obtained from the platform trust store by setting
SSL_CERT_FILE
orSSL_CERT_DIR
. Filenames in the latter must be hashed in the style of OpenSSL’sc_rehash
utility.
nextclade dataset get
Download available Nextclade datasets
For short help type: nextclade -h
, for extended help type: nextclade --help
. Each subcommand has its own help, for example: nextclade run --help
.
Usage: nextclade dataset get [OPTIONS] --name <NAME> <--output-dir <OUTPUT_DIR>|--output-zip <OUTPUT_ZIP>>
Options:
-n
,--name <NAME>
— Name of the dataset to download. Typenextclade dataset list
to view available datasets-t
,--tag <TAG>
— Version tag of the dataset to download.If this flag is not provided the latest version is downloaded.
--server <SERVER>
— Use custom dataset server.You can host your own dataset server, with one or more datasets, grouped into dataset collections, and use this server to provide datasets to users of Nextclade CLI and Nextclade Web. Refer to Nextclade dataset documentation for more details.
-o
,--output-dir <OUTPUT_DIR>
— Path to directory to write dataset files to.This flag is mutually exclusive with
--output-zip
, and provides the equivalent output, but in the form of a directory with files, instead of a compressed zip archive.If the required directory tree does not exist, it will be created.
-z
,--output-zip <OUTPUT_ZIP>
— Path to resulting dataset zip file.This flag is mutually exclusive with
--output-dir
, and provides the equivalent output, but in the form of compressed zip archive instead of a directory with files.If the required directory tree does not exist, it will be created.
-x
,--proxy <PROXY>
— Pass all traffic over proxy server. HTTP, HTTPS, and SOCKS5 proxies are supported--proxy-user <PROXY_USER>
— Username for basic authentication on proxy server, if applicable. Only valid when--proxy
is also supplied.--proxy-user
and--proxy-pass
must be either both specified or both omitted--proxy-pass <PROXY_PASS>
— Password for basic authentication on proxy server, if applicable. Only valid when--proxy
is also supplied.--proxy-user
and--proxy-pass
must be either both specified or both omitted--extra-ca-certs <EXTRA_CA_CERTS>
— Path to extra CA certificates as a PEM bundle.You can also provide the path to CA certificates in the environment variable
NEXTCLADE_EXTRA_CA_CERTS
. The argument takes precedence over the environment variable if both are provided.Default CA certificates are those obtained from the platform/OS-level trust store plus those from a baked-in copy of Mozilla’s common CA trust store. You can override the certs obtained from the platform trust store by setting
SSL_CERT_FILE
orSSL_CERT_DIR
. Filenames in the latter must be hashed in the style of OpenSSL’sc_rehash
utility.
nextclade sort
Sort sequences according to the inferred Nextclade dataset (pathogen)
For short help type: nextclade -h
, for extended help type: nextclade --help
. Each subcommand has its own help, for example: nextclade sort --help
.
Usage: nextclade sort [OPTIONS] [INPUT_FASTAS]...
Arguments:
<INPUT_FASTAS>
— Path to one or multiple FASTA files with input sequencesSupports the following compression formats: “gz”, “bz2”, “xz”, “zst”. If no files provided, the plain fasta input is read from standard input (stdin).
Options:
-m
,--input-minimizer-index-json <INPUT_MINIMIZER_INDEX_JSON>
— Path to input minimizer index JSON file.By default, the latest reference minimizer index is fetched from the dataset server (default or customized with
--server
argument). If this argument is provided, the algorithm skips fetching the default index and uses the index provided in the JSON file.Supports the following compression formats: “gz”, “bz2”, “xz”, “zst”. Use “-” to read uncompressed data from standard input (stdin).
-O
,--output-dir <OUTPUT_DIR>
— Path to output directorySequences will be written in subdirectories: one subdirectory per dataset. Sequences inferred to be belonging to a particular dataset will be placed in the corresponding subdirectory. The subdirectory tree can be nested, depending on how dataset names are organized - dataset names can contain slashes, and they will be treated as path segment delimiters.
If the required directory tree does not exist, it will be created.
Mutually exclusive with
--output-path
.-o
,--output-path <OUTPUT_PATH>
— Template string for the file path to output sorted sequences. A separate file will be generated per dataset.The string should contain template variable
{name}
, where the dataset name will be substituted. Note that if the{name}
variable contains slashes, they will be interpreted as path segments and subdirectories will be created.Make sure you properly quote and/or escape the curly braces, so that your shell, programming language or pipeline manager does not attempt to substitute the variables.
Mutually exclusive with
--output-dir
.If the provided file path ends with one of the supported extensions: “gz”, “bz2”, “xz”, “zst”, then the file will be written compressed. If the required directory tree does not exist, it will be created.
Example for bash shell:
–output=’outputs/{name}/sorted.fasta.gz’
-r
,--output-results-tsv <OUTPUT_RESULTS_TSV>
— Path to output results TSV fileIf the provided file path ends with one of the supported extensions: “gz”, “bz2”, “xz”, “zst”, then the file will be written compressed. Use “-” to write uncompressed to standard output (stdout). If the required directory tree does not exist, it will be created.
--min-score <MIN_SCORE>
— Minimum value of the score being considered for a detectionDefault value:
0.1
--min-hits <MIN_HITS>
— Minimum number of the index hits required for a detectionDefault value:
5
--max-score-gap <MAX_SCORE_GAP>
— Maximum score difference between two adjacent dataset matches, after which the less fitting datasets are not considered.This argument will truncate the list of datasets considered for a detection, such that if there is a large enough difference in score (“gap”) in the list, all datasets that are worse than the dataset before the gap are removed from consideration. This allows, in situation when there’s 2 or more groups of similar datasets, to filter-out the groups that are worse than the best group.
Default value:
0.2
--all-matches
— Whether to consider all datasetsBy default, only the top matching dataset is considered. When this flag is provided, all datasets reaching the matching criteria are considered.
Default value:
false
-j
,--jobs <JOBS>
— Number of processing jobs. If not specified, all available CPU threads will be used--server <SERVER>
— Use custom dataset server.You can host your own dataset server, with one or more datasets, grouped into dataset collections, and use this server to provide datasets to users of Nextclade CLI and Nextclade Web. Refer to Nextclade dataset documentation for more details.
-x
,--proxy <PROXY>
— Pass all traffic over proxy server. HTTP, HTTPS, and SOCKS5 proxies are supported--proxy-user <PROXY_USER>
— Username for basic authentication on proxy server, if applicable. Only valid when--proxy
is also supplied.--proxy-user
and--proxy-pass
must be either both specified or both omitted--proxy-pass <PROXY_PASS>
— Password for basic authentication on proxy server, if applicable. Only valid when--proxy
is also supplied.--proxy-user
and--proxy-pass
must be either both specified or both omitted--extra-ca-certs <EXTRA_CA_CERTS>
— Path to extra CA certificates as a PEM bundle.You can also provide the path to CA certificates in the environment variable
NEXTCLADE_EXTRA_CA_CERTS
. The argument takes precedence over the environment variable if both are provided.Default CA certificates are those obtained from the platform/OS-level trust store plus those from a baked-in copy of Mozilla’s common CA trust store. You can override the certs obtained from the platform trust store by setting
SSL_CERT_FILE
orSSL_CERT_DIR
. Filenames in the latter must be hashed in the style of OpenSSL’sc_rehash
utility.
nextclade read-annotation
Read genome annotation and present it in Nextclade’s internal formats. This is mostly only useful for Nextclade maintainers and the most curious users. Note that these internal formats have no stability guarantees and can be changed at any time without notice.
For short help type: nextclade -h
, for extended help type: nextclade --help
. Each subcommand has its own help, for example: nextclade sort --help
.
Usage: nextclade read-annotation [OPTIONS] [INPUT_ANNOTATION]
Arguments:
<INPUT_ANNOTATION>
— Genome annotation file in GFF3 format.Learn more about Generic Feature Format Version 3 (GFF3): https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md
Options:
-o
,--output <OUTPUT>
— Path to output JSON or YAML file.The format is chosen based on file extension: “.json” or “.yaml”.
--feature-tree
— Present features in “feature tree” format. This format is a precursor of genome annotation format - it contains all genetic features, even the ones that Nextclade does not use, but also less information about each feature--json
— Print console output in JSON format, rather than human-readable table
nextclade help-markdown
Print command-line reference documentation in Markdown format
Usage: nextclade help-markdown
This document was generated automatically by
clap-markdown
.