Augur v6 Release Notes

Augur v6 was released on 2019-12-10. This release contains a number of changes from Augur v5, including feature additions and bugfixes. The biggest change is related to how Augur exports files for visualisation by Auspice. We’ve written an extensive guide explaining our motivations here, what has changed, how to upgrade, and how this interfaces with Auspice.


Export JSONs for specific versions of Auspice

Probably the biggest (breaking) change you’ll encounter is that augur export no longer works! See the migration guide for a detailed explanation of this.

Breaking change: augur export no longer works, and now requires a further positional argument to define which version of Auspice you wish to target. augur export v1 should behave the same as previous versions’ augur export.

Reference sequence output

The export command now accepts a flag to export the reference/root sequence relative to which mutations are called, see here for more detail.

Change in augur ancestral’s arguments

augur ancestral, which reconstructs mutations across a tree, now supports two forms of output and the arguments have become more descriptive.

  1. JSON output, including mutations for each branch and (inferred) ancestral sequences. This is specified by the --output-node-data argument.

  2. FASTA output of reconstructed ancestral sequences. This had previously been available for VCF-inputs, but now works for any input. Users can ask for this output and specify a file name using --output-sequences.

Deprecation warning: The argument --output is now deprecated. Please use --output-node-data instead.

Import BEAST MCC trees

We now have instructions and functionality to import BEAST trees, see here.

Prettifying of strings

Previous auspice version “prettified” metadata strings (like changing ‘north_america’ to ‘North America’). Auspice v2 no longer does this, see here for more detail. The parse command now accepts and argument to apply string prettifying operations to metadata parsed from fasta headers.

Whitespace in colors and lat-longs TSVs

To allow whitespace in metadata, files specifying colors and geographic locations now need to be TAB delimited.

Move to GFF-style annotations

Starting with Augur v6 we now use GFF coordinates: [one-origin, inclusive] as opposed to BED coordinates. Strands are represented by + or - rather than 1 or 0. Additionally, we export the seqid, but don’t use it in Auspice.

Improvements and usage of JSON validation

The export command will now validate the produced JSON against the schema.

Removal of non-modular Augur and old builds

Augur has been a dynamic, shapeshifting beast. It started as scripts for Nextflu, took on more and more pathogens, was refactored into “prepare” and “process” steps, and refactored again into the “modular” Augur we now have. Earlier incarnations of Augur have now been removed from the GitHub repo (./base/*).

Paralleling the different incarnations of Augur was a move to “builds” being their own self-contained repos. We think this has been remarkably successful, and de-couples the bioinformatics tooling from a pathogen build. With this release of Augur we’ve now removed these builds from the Augur GitHub repo, and the only builds that remain are the test ones.

Test builds

There have been a number of test builds in the Augur repo and we have leaned heavily on them while we developed this version of Augur as well as Auspice v2. They are all self contained within ./tests/builds and can all be run and examined in Auspice via

cd tests/builds
bash runner.sh # creates output in ./auspice
auspice view --datasetDir auspice

(See the Auspice docs for Auspice-specific questions.)

Documentation improvements

Documentation has always been a bit hit-or-miss with Nextstrain projects. We’ve tried to make Augur’s read-the-docs documentation more comprehensive, with better flow. This entails new sections, with each Augur command having its own page. We’ve tried to use redirects to ensure that all the old links continue to work.

Miscellaneous

  • augur filter: More interpretable output of how many sequences each stage has filtered out.

  • augur filter: Additional flag --subsample-seed to seed the random number generator and thereby make subsampling reproducible.

  • augur sequence-traits: Numerical output as originally intended, but required an Auspice bugfix.

  • augur traits: Explanation of what is considered missing data & how it is interpreted.

  • augur traits: GTR models are exported in the output JSON for better accountability & reproducibility.

  • Errors in formatting of input files (e.g. metadata files, Auspice config files) weren’t handled nicely, often resulting in hard-to-interpret stack traces. We now try to catch these and print an error indicating the offending file.

  • Tests using Python version 2 have now been removed.