Glossary

Augur

A command-line application used for phylogenetic analysis. Documentation

Auspice

A web application used for phylogenetic visualization and analysis. Documentation

pathogen repository

A version-controlled folder containing all files necessary to run a pathogen’s workflows.

core repository

A pathogen repository maintained by the Nextstrain team.

workflow

A reproducible process comprised of one or more builds producing datasets. Implementation varies per workflow, but generally they are run by workflow managers such as Snakemake.

A Nextstrain pathogen repository typically consists of these different workflows

phylogenetic workflow
ingest workflow
Nextclade workflow

Our core workflows can be divided into two types:

Single-build workflow (e.g. Zika workflow): one build producing one dataset.
Multi-build workflow (e.g. SARS-CoV-2 workflow): multiple builds producing multiple datasets.

Note

The individual builds in a multi-build workflow are also “workflows” in the definition of workflow managers like Snakemake.

phylogenetic workflow

also Nextstrain workflow

A workflow consisting of build(s) that execute bioinformatic analyses with Augur to generate phylogenetic dataset(s) for visualization with Auspice.

The phylogenetic workflow is often considered the primary workflow in a pathogen repository (e.g. “the Zika workflow” typically means “the phylogenetic workflow in the Zika pathogen repository”).

ingest workflow

A workflow consisting of build(s) that curate public metadata and sequences to generate ingest dataset(s) that are typically used as input files for phylogenetic workflows and Nextclade workflows.

Nextclade workflow

A workflow consisting of build(s) that generate reference tree(s) to be packaged with other dataset files to create Nextclade dataset(s).

core workflow

A default workflow maintained by the Nextstrain team that can usually be run without additional configurations or customizations.

build

also Nextstrain build, phylogenetic build, ingest build, Nextclade build

(noun) A sequence of commands, parameters and input files which work together to reproducibly generate a dataset.

build (verb)

A general term for running a workflow (e.g. nextstrain build).

build step

A modular instruction of a build which can be run standalone (e.g. augur filter), often with clear input and output files.

dataset

A collection of output files produced by a build. A Nextstrain pathogen repository typically produces multiple types of datasets

phylogenetic dataset
ingest dataset
Nextclade dataset

phylogenetic dataset

also Auspice JSONs

A dataset consisting of JSONs produced by a build of a phylogenetic workflow. It is also the shared file prefix of the JSONs. For example flu/seasonal/h3n2/ha/2y identifies a dataset which corresponds to the files:

flu_seasonal_h3n2_ha_2y.json: primary JSON file
flu_seasonal_h3n2_ha_2y_root-sequence.json: sidecar file
flu_seasonal_h3n2_ha_2y_tip-frequencies.json: sidecar file

Some phylogenetic workflows produce a single, synonymous dataset, like Zika. Others, like seasonal flu, produce many datasets. The phylogenetic dataset is often considered the primary dataset in a pathogen repository (e.g. “the Zika dataset” typically means “the phylogenetic dataset from the Zika pathogen repository”).

ingest dataset

A dataset consisting of curated files produced by a build of an ingest workflow. Typically consists of the files:

metadata.tsv
sequences.fasta

If the ingest workflow includes Nextclade build steps, then the dataset will typically include Nextclade output files as well.

Nextclade dataset

A dataset consisting of files required for a Nextclade analysis, usually produced by a build of a Nextclade workflow. See documentation for more details

narrative

A method of data-driven storytelling with interactive views of phylogenetic datasets displayed alongside multiple pages (or slides) of text and images. Saved as a Markdown file with extended syntax to support additional displays.

Viewable on nextstrain.org or with Auspice via the nextstrain view or auspice view commands.

JSONs

Special .json files produced by Augur and visualized by Auspice. These files make up a phylogenetic dataset. See data formats.

Nextstrain CLI

The Nextstrain command-line interface (Nextstrain CLI) provides a consistent way to run and visualize pathogen builds and access Nextstrain components like Augur and Auspice across runtimes such as Docker, Conda, and AWS Batch.

Documentation

runtime

also Nextstrain runtime

When installing and using the Nextstrain CLI, there are different configuration options, or runtimes, depending on the operating system.

Docker runtime
Conda runtime
Ambient runtime (formerly “native”)
AWS Batch runtime (only for nextstrain build)