Overview of this repository (i.e., what do these files do?)
The files in this repository fall into one of these categories:
Input files
Output files and directories
Workflow configuration files we might want to customize
Workflow configuration files we don’t need to touch
Documentation
We’ll walk through all of the files one by one, but here are the most important ones for your reference:
Input files
data/metadata.tsv
: tab-delimited description of strain (i.e., sample) attributesdata/sequences.fasta
: genomic sequences whose ids must match thestrain
column inmetadata.tsv
. See the data preparation guide.my_profiles/<your_profile>/builds.yaml
: workflow configuration file where you can define and parameterize the builds you want to run. The directory nameyour_profile
is the name of your custom analysis profile where you store this configuration and other custom files for the analysis. See the customization guide.
Output files
auspice/<build_name>.json
: output file for visualization in Auspice where<build_name>
is the name of a build defined in the workflow configuration file.
Input files
data/metadata.tsv
: tab-delimited description of strain (i.e., sample) attributesdata/sequences.fasta
: genomic sequences whose ids must match thestrain
column inmetadata.tsv
. See the data preparation guide.defaults/include.txt
: list of strain names to include during subsampling and filtering (one strain name per line)defaults/exclude.txt
: list of strain names to exclude during subsampling and filtering (one strain name per line)
Output files and directories
auspice/<build_name>.json
: output file for visualization in Auspice where<build_name>
is the name of your build in the workflow configuration file.results/aligned.fasta
, etc.: raw results files (dependencies) that are shared across all builds.results/<build_name>/
: raw results files (dependencies) that are specific to a single build.logs/
: Log files with error messages and other information about the run.benchmarks/
: Run-times (and memory usage on Linux systems) for each rule in the workflow.
Workflow configuration files we might want to customize
my_profiles/<your_profile>/builds.yaml
: workflow configuration file where you can define and configure all the builds you’d like to run. See the customization guide.my_profiles/<your_profile>/config.yaml
: Snakemake profile configuration where you can define the number of cores to use at once, etc. See the customization guide.defaults/parameters.yaml
: default workflow configuration parameters. Override these settings in the workflow configuration file (builds.yaml
) above.defaults/auspice_config.json
: default visualization configuration file. Override these settings inmy_profiles/<your_profile>/auspice_config.json
. See the customization guide for visualizations.
Workflow configuration files we don’t need to touch
Snakefile
: entry point for Snakemake commands that also validates inputs. No modification needed.workflow/snakemake_rules/main_workflow.smk
: defines rules for running each step in the analysis. Modify yourbuilds.yaml
file, rather than hardcode changes into the snakemake file itself.workflow/envs/nextstrain.yaml
: specifies computing environment needed to run workflow with the--use-conda
flag. No modification needed.workflow/schemas/config.schema.yaml
: defines format (e.g., required fields and types) forbuilds.yaml
files. This can be a useful reference, but no modification needed.scripts/
: helper scripts for common tasks. No modification needed.