Run using example data
======================

This first tutorial introduces our SARS-CoV-2 workflow.
You will run the workflow using a small set of reference data that we provide.
Subsequent tutorials present more complex scenarios that build on this approach.

.. contents:: Table of Contents
   :local:

Prerequisites
-------------

1. :doc:`setup`. These instructions will install all of the software you need to complete this tutorial and others.

Setup
-----

1. Change directory to the ``ncov`` directory:

   .. code:: text

      cd ncov

2. Download the example tutorial repository into a new subdirectory of ``ncov/`` called ``ncov-tutorial/``:

   .. code:: text

      git clone https://github.com/nextstrain/ncov-tutorial

Run the workflow
----------------

From within the ``ncov/`` directory, run the workflow using a :term:`configuration file <config file>` provided in the tutorial directory:

.. code:: text

   nextstrain build . --configfile ncov-tutorial/example-data.yaml

Break down the command
~~~~~~~~~~~~~~~~~~~~~~

The workflow can take several minutes to run. While it is running, you can learn about the parts of this command:

- ``nextstrain build .``
   - This tells the :term:`docs.nextstrain.org:Nextstrain CLI` to :term:`build <docs.nextstrain.org:build (verb)>` the workflow from ``.``, the current directory. All subsequent command-line arguments are passed to the workflow manager, Snakemake.
- ``--configfile ncov-tutorial/example-data.yaml``
   - ``--configfile`` is a Snakemake option used to `configure <https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html#standard-configuration>`__ the ncov workflow. It takes a file path as the value.
   - ``ncov-tutorial/example-data.yaml`` is the value given to ``--configfile``. It is a :term:`config file` that provides custom workflow configuration including inputs and outputs. The contents of this file with comments excluded are:

      .. code-block:: yaml

         inputs:
           - name: reference_data
             metadata: https://data.nextstrain.org/files/ncov/open/reference/metadata.tsv.xz
             sequences: https://data.nextstrain.org/files/ncov/open/reference/sequences.fasta.xz

         refine:
           root: "Wuhan-Hu-1/2019"

      The ``inputs`` entry provides the workflow with one input named ``reference_data``. The metadata and sequence files refer to a sample of approximately 300 sequences maintained by the Nextstrain team that represent all Nextstrain clades annotated for SARS-CoV-2. The workflow downloads these files directly from the associated URLs. :doc:`See the complete list of SARS-CoV-2 datasets we provide through data.nextstrain.org <../reference/remote_inputs>`.

      The ``refine`` entry specifies the root sequence for the example GenBank data.

      For more information, :doc:`see the workflow configuration file reference <../reference/workflow-config-file>`.

Running the workflow produces two new directories:

- ``auspice/`` contains a few files that represent a Nextstrain :term:`docs.nextstrain.org:dataset` to be visualized in the following section.
- ``results/`` contains intermediate files generated during workflow execution.

Visualize the results
---------------------

Run this command to start the :term:`docs.nextstrain.org:Auspice` server, providing ``auspice/`` as the directory containing output dataset files:

.. code:: text

   nextstrain view auspice/

Navigate to http://127.0.0.1:4000/ncov/default-build. The resulting :term:`docs.nextstrain.org:dataset` should show a phylogeny of ~200 sequences:

.. figure:: ../images/dataset-example-data.png
   :alt: Phylogenetic tree from the "example data" tutorial as visualized in Auspice

To stop the server, press :kbd:`Control-C` on your keyboard.

.. note::

   You can also view the results by dragging the dataset files all at once onto `auspice.us <https://auspice.us>`__:

   - ``auspice/ncov_default-build.json``
   - ``auspice/ncov_default-build_root-sequence.json``
   - ``auspice/ncov_default-build_tip-frequencies.json``