Files overview ============== This page gives an overview of the files in your local ``ncov/`` directory. .. contents:: :local: User files ---------- User files are not tracked by version control, meaning they are either provided by the user or generated by the workflow. Analysis directory ~~~~~~~~~~~~~~~~~~ An :term:`analysis directory` is a non-tracked directory which contains user-defined :term:`customization files `. In the :doc:`tutorials <../tutorial/intro>`, the analysis directory is ``ncov-tutorial/``. Follow :ref:`these steps ` to create your own analysis directory. .. hint:: Previously, we recommended using Snakemake profiles under a ``my_profiles/`` analysis directory. We now recommend using Snakemake config files directly via the ``--configfile`` parameter. You can still use existing profiles via ``--configfile my_profiles//builds.yaml``. Input files ~~~~~~~~~~~ Learn how to prepare input files with :doc:`../guides/data-prep/index`. .. note:: A few example input files are provided when you clone ``ncov/`` locally, under ``data/``. - Metadata file (e.g. ``data/example_metadata.tsv``): tab-delimited description of strain (i.e., sample) attributes - Sequences file (e.g. ``data/example_sequences.fasta.gz``): genomic sequences whose ids must match the ``strain`` column in the metadata file. Output files and directories ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ These are generated by the workflow. - ``auspice/.json``: output file for visualization in Auspice where ```` is the name of your output dataset in the workflow configuration file used by ``--configfile``. - ``results/aligned.fasta``, etc.: raw results files (dependencies) that are shared across all datasets. - ``results//``: raw results files (dependencies) that are specific to a single dataset. - ``logs/``: Log files with error messages and other information about the run. - ``benchmarks/``: Run-times (and memory usage on Linux systems) for each rule in the workflow. Internal files -------------- These files are not intended for modification. See :doc:`../guides/workflow-config-file` on how to configure workflow behavior. Default workflow customization files ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - ``defaults/parameters.yaml``: default :term:`config file`. Override these settings using ``--configfile your-config.yaml``. - ``defaults/auspice_config.json``: default :term:`Auspice config file`. Override these settings using ``auspice_config``. - ``defaults/include.txt``: default strain names to *include* during subsampling and filtering. - ``defaults/exclude.txt``: default strain names to *exclude* during subsampling and filtering. Workflow definition files ~~~~~~~~~~~~~~~~~~~~~~~~~ - ``Snakefile``: entry point for Snakemake commands that also validates inputs. - ``workflow/snakemake_rules/main_workflow.smk``: defines rules for running each step in the analysis. Modify your workflow config file, rather than hardcode changes into the snakemake file itself. - ``workflow/envs/nextstrain.yaml``: specifies computing environment needed to run workflow with the ``--use-conda`` flag. - ``workflow/schemas/config.schema.yaml``: defines format (e.g., required fields and types) for workflow config files. - ``scripts/``: helper scripts for common tasks. Documentation ~~~~~~~~~~~~~ These files are used to generate the `workflow documentation `__. Nextstrain user files ~~~~~~~~~~~~~~~~~~~~~ The Nextstrain team maintains user files in the ``ncov/`` repo, under ``nextstrain_profiles/``.