Data formats

Nextstrain uses a few different kinds of JSON files at various stages in a typical build.

The primary JSON files used by Nextstrain are those consumed by Auspice to display a dataset. Without these dataset files, Auspice has nothing to display. These files are typically the final output of a build and produced by the Augur command augur export. They come in two versions:

v2

Newer format, with a filename of your choosing like ${name}.json. This is often referred to as the “main” file.

v1

Original format, with filenames like ${name}_tree.json and ${name}_meta.json, often referred to as the “tree” and “meta” files.

Secondary JSON files used by Nextstrain come in two flavors: sidecar files and node data files.

Sidecar files are produced by Augur for direct consumption by Auspice, alongside the primary JSON files described above. They come in two types with filenames enforced by convention:

root-sequence

Filenames like ${name}_root-sequence.json, produced by augur export v2’s --include-root-sequence option.

tip-frequencies

Filenames like ${name}_tip-frequencies.json, produced by augur frequencies with the --output-format auspice --output options.

Node data files are typically produced by various Augur commands such as augur traits or augur ancestral and are then fed into augur export to be merged together into a final output for Auspice. Node data files can have any filename you want but some common names are:

  • nt_muts.json

  • aa_muts.json

  • traits.json

  • branch_lengths.json

  • ${name}_aa-mutation-frequencies.json

  • ${name}_entropy.json

  • ${name}_frequencies.json

  • ${name}_sequences.json

  • ${name}_titers.json

Node data files have a generic structure to allow them to contain all kinds of data about your tree.

In advanced builds, custom node data files are often produced by build-specific scripts in addition to the ones produced by Augur commands. For example, our ncov build produces a custom epiweeks.json node data file using this workflow step and this script.

Similarly, it’s possible for other bioinformatics software to produce compatible dataset JSONs (primary or sidecars) for use by Auspice; they aren’t required to be generated by Augur, although that is the most common way. Augur’s validation command can check that dataset JSONs have the required schema.

Once you have Nextstrain JSON files, you can visualize and share them in a variety of ways. See our guide to sharing your results to find a way that meets your needs for privacy and collaboration.