Running a pathogen workflow

This tutorial uses the Nextstrain CLI to help you get started running pathogen workflows and viewing the datasets you see on nextstrain.org. It assumes you are comfortable using the command line and installing software on your computer. If you need help when following this tutorial, please create a post at discussion.nextstrain.org.

In this tutorial, you will run our example Zika workflow and view the results on your computer. You will have a basic understanding of how to run workflows for other pathogens and a foundation for understanding the Nextstrain ecosystem in more depth.

Prerequisites

  1. Install Nextstrain. These instructions will install all of the software you need to complete this tutorial and others.

Download the example Zika workflow repository

Pathogen workflows are stored in workflow repositories (version-controlled folders) to track changes over time. Download the example Zika workflow repository.

$ git clone https://github.com/nextstrain/zika-tutorial
Cloning into 'zika-tutorial'...
[...more output...]

When it’s done, you’ll have a new directory called zika-tutorial/.

Run the workflow

Pathogen workflows use the Augur bioinformatics toolkit to subsample data, align sequences, build a phylogeny, estimate phylogeographic patterns, and save the results in a format suitable for visualization with Auspice.

Run the workflow with the Nextstrain CLI.

$ nextstrain build --cpus 1 zika-tutorial/
Building DAG of jobs...
[...a lot of output...]

This should take just a few minutes to complete. To save time, this tutorial uses example data which is much smaller than our live Zika analysis.

Output files will be in the directories zika-tutorial/data/, zika-tutorial/results/ and zika-tutorial/auspice/.

Visualize results

View the resulting dataset using Nextstrain’s visualizations.

$ nextstrain view zika-tutorial/auspice/
β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”
    The following datasets should be available in a moment:
       β€’ http://127.0.0.1:4000/zika
β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”
[...more output...]

Open the dataset URL in your web browser.

Screenshot of Zika example dataset viewed in Nextstrain

Next steps