Files overview

This page gives an overview of the files in your local ncov/ directory.

User files 

User files are not tracked by version control, meaning they are either provided by the user or generated by the workflow.

Analysis directory 

An analysis directory is a non-tracked directory which contains user-defined customization files.

In the tutorials, the analysis directory is ncov-tutorial/. Follow these steps to create your own analysis directory.

Hint

Previously, we recommended using Snakemake profiles under a my_profiles/ analysis directory. We now recommend using Snakemake config files directly via the --configfile parameter. You can still use existing profiles via --configfile my_profiles/<profile_name>/builds.yaml.

Input files 

Learn how to prepare input files with Data preparation guide.

Note

A few example input files are provided when you clone ncov/ locally, under data/.

Metadata file (e.g. data/example_metadata.tsv): tab-delimited description of strain (i.e., sample) attributes
Sequences file (e.g. data/example_sequences.fasta.gz): genomic sequences whose ids must match the strain column in the metadata file.

Output files and directories 

These are generated by the workflow.

auspice/<dataset_name>.json: output file for visualization in Auspice where <dataset_name> is the name of your output dataset in the workflow configuration file used by --configfile.
results/aligned.fasta, etc.: raw results files (dependencies) that are shared across all datasets.
results/<dataset_name>/: raw results files (dependencies) that are specific to a single dataset.
logs/: Log files with error messages and other information about the run.
benchmarks/: Run-times (and memory usage on Linux systems) for each rule in the workflow.

Internal files 

These files are not intended for modification. See Workflow config file guide on how to configure workflow behavior.

Default workflow customization files 

defaults/parameters.yaml: default config file. Override these settings using --configfile your-config.yaml.
defaults/auspice_config.json: default Auspice config file. Override these settings using auspice_config.
defaults/include.txt: default strain names to include during subsampling and filtering.
defaults/exclude.txt: default strain names to exclude during subsampling and filtering.

Workflow definition files 

Snakefile: entry point for Snakemake commands that also validates inputs.
workflow/snakemake_rules/main_workflow.smk: defines rules for running each step in the analysis. Modify your workflow config file, rather than hardcode changes into the snakemake file itself.
workflow/envs/nextstrain.yaml: specifies computing environment needed to run workflow with the --use-conda flag.
workflow/schemas/config.schema.yaml: defines format (e.g., required fields and types) for workflow config files.
scripts/: helper scripts for common tasks.

Documentation 

These files are used to generate the workflow documentation.

Nextstrain user files 

The Nextstrain team maintains user files in the ncov/ repo, under nextstrain_profiles/.