Files overview

This page gives an overview of the files in your local ncov/ directory.

User files

User files are not tracked by version control, meaning they are either provided by the user or generated by the workflow.

Analysis directory

An analysis directory is a non-tracked directory which contains user-defined customization files.

In the tutorials, the analysis directory is ncov-tutorial/. Follow these steps to create your own analysis directory.

Hint

Previously, we recommended using Snakemake profiles under a my_profiles/ analysis directory. We now recommend using Snakemake config files directly via the --configfile parameter. You can still use existing profiles via --configfile my_profiles/<profile_name>/builds.yaml.

Input files

Learn how to prepare input files with Data preparation guide.

Note

A few example input files are provided when you clone ncov/ locally, under data/.

  • Metadata file (e.g. data/example_metadata.tsv): tab-delimited description of strain (i.e., sample) attributes

  • Sequences file (e.g. data/example_sequences.fasta.gz): genomic sequences whose ids must match the strain column in the metadata file.

Output files and directories

These are generated by the workflow.

  • auspice/<dataset_name>.json: output file for visualization in Auspice where <dataset_name> is the name of your output dataset in the workflow configuration file used by --configfile.

  • results/aligned.fasta, etc.: raw results files (dependencies) that are shared across all datasets.

  • results/<dataset_name>/: raw results files (dependencies) that are specific to a single dataset.

  • logs/: Log files with error messages and other information about the run.

  • benchmarks/: Run-times (and memory usage on Linux systems) for each rule in the workflow.

Internal files

These files are not intended for modification. See Workflow config file guide on how to configure workflow behavior.

Default workflow customization files

  • defaults/parameters.yaml: default config file. Override these settings using --configfile your-config.yaml.

  • defaults/auspice_config.json: default Auspice config file. Override these settings using auspice_config.

  • defaults/include.txt: default strain names to include during subsampling and filtering.

  • defaults/exclude.txt: default strain names to exclude during subsampling and filtering.

Workflow definition files

  • Snakefile: entry point for Snakemake commands that also validates inputs.

  • workflow/snakemake_rules/main_workflow.smk: defines rules for running each step in the analysis. Modify your workflow config file, rather than hardcode changes into the snakemake file itself.

  • workflow/envs/nextstrain.yaml: specifies computing environment needed to run workflow with the --use-conda flag.

  • workflow/schemas/config.schema.yaml: defines format (e.g., required fields and types) for workflow config files.

  • scripts/: helper scripts for common tasks.

Documentation

These files are used to generate the workflow documentation.

Nextstrain user files

The Nextstrain team maintains user files in the ncov/ repo, under nextstrain_profiles/.