Data formats
Nextstrain uses a few different kinds of JSON files at various stages in a typical build.
The primary JSON files used by Nextstrain are those consumed by Auspice to display a dataset. Without these dataset files, Auspice has nothing to display. These files are typically the final output of a build and produced by the Augur command augur export. They come in two versions:
- v2
Newer format, with a filename of your choosing like
${name}.json
. This is often referred to as the “main” file.- v1
Original format, with filenames like
${name}_tree.json
and${name}_meta.json
, often referred to as the “tree” and “meta” files.
Secondary JSON files used by Nextstrain come in two flavors: sidecar files and node data files.
Sidecar files are produced by Augur for direct consumption by Auspice, alongside the primary JSON files described above. They come in three types with filenames enforced by convention:
- root-sequence
Filenames like
${name}_root-sequence.json
, produced byaugur export v2
’s--include-root-sequence
option.- tip-frequencies
Filenames like
${name}_tip-frequencies.json
, produced by augur frequencies with the--output-format auspice --output …
options.- measurements
Filenames like
${name}_measurements.json
, produced by one of the augur measurements subcommands,export
orconcat
.
Node data files are typically produced by various Augur commands such as augur traits or augur ancestral and are then fed into augur export to be merged together into a final output for Auspice. Node data files can have any filename you want but some common names are:
nt_muts.json
aa_muts.json
traits.json
branch_lengths.json
${name}_aa-mutation-frequencies.json
${name}_entropy.json
${name}_frequencies.json
${name}_sequences.json
${name}_titers.json
Node data files have a generic structure to allow them to contain all kinds of data about your tree.
In advanced builds, custom node data files are often produced by build-specific
scripts in addition to the ones produced by Augur commands. For example, our
ncov build produces a custom
epiweeks.json
node data file using this workflow step
and this script.
Similarly, it’s possible for other bioinformatics software to produce compatible dataset JSONs (primary or sidecars) for use by Auspice; they aren’t required to be generated by Augur, although that is the most common way. Augur’s validation command can check that dataset JSONs have the required schema.
Once you have Nextstrain JSON files, you can visualize and share them in a variety of ways. See our guide to sharing your results to find a way that meets your needs for privacy and collaboration.