Glossary

Augur

A command-line application used for phylogenetic analysis. Documentation

Auspice

A web application used for phylogenetic visualization and analysis. Documentation

pathogen repository

A version-controlled folder containing all files necessary to run a pathogen’s workflows.

core repository

A pathogen repository maintained by the Nextstrain team.

workflow

A reproducible process comprised of one or more builds producing datasets. Implementation varies per workflow, but generally they are run by workflow managers such as Snakemake.

A Nextstrain pathogen repository typically consists of these different workflows

  1. phylogenetic workflow

  2. ingest workflow

  3. Nextclade workflow

Our core workflows can be divided into two types:

  1. Single-build workflow (e.g. Zika workflow): one build producing one dataset.

  2. Multi-build workflow (e.g. SARS-CoV-2 workflow): multiple builds producing multiple datasets.

Note

The individual builds in a multi-build workflow are also “workflows” in the definition of workflow managers like Snakemake.

phylogenetic workflow

also Nextstrain workflow

A workflow consisting of build(s) that execute bioinformatic analyses with Augur to generate phylogenetic dataset(s) for visualization with Auspice.

The phylogenetic workflow is often considered the primary workflow in a pathogen repository (e.g. “the Zika workflow” typically means “the phylogenetic workflow in the Zika pathogen repository”).

ingest workflow

A workflow consisting of build(s) that curate public metadata and sequences to generate ingest dataset(s) that are typically used as input files for phylogenetic workflows and Nextclade workflows.

Nextclade workflow

A workflow consisting of build(s) that generate reference tree(s) to be packaged with other dataset files to create Nextclade dataset(s).

core workflow

A default workflow maintained by the Nextstrain team that can usually be run without additional configurations or customizations.

build

also Nextstrain build, phylogenetic build, ingest build, Nextclade build

(noun) A sequence of commands, parameters and input files which work together to reproducibly generate a dataset.

build (verb)

A general term for running a workflow (e.g. nextstrain build).

build step

A modular instruction of a build which can be run standalone (e.g. augur filter), often with clear input and output files.

dataset

A collection of output files produced by a build. A Nextstrain pathogen repository typically produces multiple types of datasets

  1. phylogenetic dataset

  2. ingest dataset

  3. Nextclade dataset

phylogenetic dataset

also Auspice JSONs

A dataset consisting of JSONs produced by a build of a phylogenetic workflow. It is also the shared file prefix of the JSONs. For example flu/seasonal/h3n2/ha/2y identifies a dataset which corresponds to the files:

  • flu_seasonal_h3n2_ha_2y_meta.json

  • flu_seasonal_h3n2_ha_2y_tree.json

  • flu_seasonal_h3n2_ha_2y_tip-frequencies.json

Some phylogenetic workflows produce a single, synonymous dataset, like Zika. Others, like seasonal flu, produce many datasets. The phylogenetic dataset is often considered the primary dataset in a pathogen repository (e.g. “the Zika dataset” typically means “the phylogenetic dataset from the Zika pathogen repository”).

ingest dataset

A dataset consisting of curated files produced by a build of an ingest workflow. Typically consists of the files:

  • metadata.tsv

  • sequences.fasta

If the ingest workflow includes Nextclade build steps, then the dataset will typically include Nextclade output files as well.

Nextclade dataset

A dataset consisting of files required for a Nextclade analysis, usually produced by a build of a Nextclade workflow. See documentation for more details

narrative

A method of data-driven storytelling with interactive views of phylogenetic datasets displayed alongside multiple pages (or slides) of text and images. Saved as a Markdown file with extended syntax to support additional displays.

Viewable on nextstrain.org or with Auspice via the nextstrain view or auspice view commands.

See also Communicating Results Using Narratives and Writing a narrative.

JSONs

Special .json files produced by Augur and visualized by Auspice. These files make up a phylogenetic dataset. See data formats.

Nextstrain CLI

The Nextstrain command-line interface (Nextstrain CLI) provides a consistent way to run and visualize pathogen builds and access Nextstrain components like Augur and Auspice across runtimes such as Docker, Conda, and AWS Batch.

Documentation

runtime

also Nextstrain runtime

When installing and using the Nextstrain CLI, there are different configuration options, or runtimes, depending on the operating system.

  1. Docker runtime

  2. Conda runtime

  3. Ambient runtime (formerly “native”)

  4. AWS Batch runtime (only for nextstrain build)