Snakemake style guide
When in doubt, refer to Snakemake’s own best practices guide.
Avoid run
blocks
Instead, implement custom Python in scripts called in a shell
block.
Code in
run
blocks is less reusable. Anything we write once, we’re likely to want to reuse later.run
blocks can be challenging to debug.run
blocks do not run in rule-specific conda environments, forcing the user to manually install into their environment any dependencies that could have been in a conda environment file.
Define input
paths with literal path strings
Do this instead of using rule
variables.
Literal paths are easier to read and interpret, avoiding the need to trace back through a workflow to an earlier rule to see the path associated with a rule output.
Literal paths also allow workflows to be rewired with custom rules that are injected at runtime. For example, the ncov workflow allows users to define their own rules that can provide alternate commands for generating required files. This approach does not work with references to rule outputs, though (see ncov PR 877, for an example).
Always use relative paths
Relative paths (paths that don’t start with /
) mean that anyone can
run the build without running into portability issues caused by paths
specific to your computer.
See the Snakemake documentation for how relative paths are interpreted depending on context.
Avoid the message
rule attribute
When the message
attribute is defined, Snakemake suppresses other critical
details that otherwise get displayed by default (e.g., job id, rule name,
input, output, etc.).
Use a YAML configuration file and allow for overrides
Configuration is data and should live inside YAML files. By including the
following snippet in your Snakefile, you can provide default values and allow
for additional entries or overrides via the --configfile
or --config
options to snakemake
.
configfile: "defaults.yaml"
Configuration values are available as a config
dictionary provided in scope
afterwards.
Access config
values appropriately
Use the appropriate method to access configuration in the config
global variable. 3 ways are supported, but only 2 should be used:
config[key]
: Use this when the key is required, or a default is specified in a pre-loaded configuration file.key [not] in config
: Use this when the key is optional and you want to check if a value is specified.config.get(key, default)
: Use this when the key is optional and you want to access its value.config.get(key)
: Never use this. All use cases should be covered by the options above. Using this will only mask errors that may be due to a missing required key.
Use Snakemake params:
block to map into config
dictionary
For example, do this:
params:
name = lambda _: config["name"]
shell:
r"""
echo {params.name:q}
"""
instead of using the config
dictionary directly in the shell
command. This has several benefits:
Interpolation of dictionary lookups in the shell commands is non-standard and confusing. (You have to use
{config[name]}
, for example. Note that the dictionary key is unquoted.)Param definitions can use arbitrary Python expressions, so you can do more complicated things than you can with direct interpolation, such as list comprehensions.
Snakemake can automatically discover which rules have parameter values that are different than the last run and show what output files are affected (
--list-params-changes
).
Use lambda on params
that may have {
or }
in the value
If the value passed to a param contains curly braces, Snakemake will attempt to resolve it as a wildcard. To keep the value as-is, use a lambda expression.
Example:
params:
key=lambda w: config["value_may_contain_curlies"]
Always use quoted (:q) interpolations
When building shell commands to run, Snakemake does not by default properly quote interpolated values. This works fine if the interpolated value doesn’t contain spaces or other special shell metacharacters (like quotes or backslashes), but it is fragile and a time-bomb waiting to break on future values.
Standard best practice in any language or environment is to always quote
parameters in generated shell commands. Snakemake supports this using
the :q
modifier for interpolation:
params:
file = "filename with spaces.txt"
shell:
r"""
wc -l {params.file:q}
"""
Not quoting these values is also a security risk.
It may be tempting to make an exception for parameters with multiple values where you want each become a separate command-line argument, such as a parameter listing three filenames. In this case, however, it’s recommended that you make the parameter a list instead of a single string. Snakemake will interpolate it correctly:
params:
files = ["a.txt", "b.txt", "c.txt"]
shell:
r"""
wc -l {params.files:q}
"""
Use raw, triple-quoted shell blocks
Using raw, triple-quoted (r"""
or r'''
) shell
blocks makes it
much easier to build readable commands with one-option per line. It also
avoids any nested quoting issues if you need to use literal single or
double quotes in your command. The command will remain readable in
Snakemake’s logging messages because it’ll look like the source form
(e.g. with backslashes and newlines retained instead of collapsed).
Example:
shell:
r"""
augur parse \
--sequences {input:q} \
--fields {params.fields:q} \
--output-sequences {output.sequences:q} \
--output-metadata {output.metadata:q}
"""
Hint
If you’re converting interpreted strings to raw strings (e.g.
"""
to r"""
), make sure to check that they’re not relying on
escape sequences like \n
, \t
, or \\
to be interpreted by
Python before the shell (Bash) sees them.
Log standard out and error output to log files and the terminal
Use the Snakemake log directive for each rule that writes output to either standard out or error and direct output to the corresponding log file.
Use the tee
command to ensure that output gets written to both the log file and the terminal, so users can track their workflow progress interactively and use the log file later for debugging.
Example:
rule filter:
input:
metadata="results/metadata.tsv",
output:
metadata="results/filtered_metadata.tsv",
log:
"logs/filter.txt"
shell:
r"""
augur filter \
--metadata {input.metadata} \
--output-metadata {output.metadata} 2>&1 | tee {log}
"""
Before using tee
, ensure that your workflow uses bash’s pipefail option, so successful tee
execution does not mask errors from earlier commands in the pipe.
Snakemake uses bash’s strict mode by default, so the pipefail option should be enabled by default.
However, some workflows may override the defaults locally at specific rules or globally as with a custom shell prefix.
Run workflows with --show-failed-logs
Run workflows with the --show-failed-logs
which will print the logs for failed jobs to the terminal when the workflow exits.
This pattern helps users identify error messages without first finding the corresponding log file.
Always use the benchmark
directive
Use the Snakemake benchmark directive for each rule so that it is easy to track run time and memory usage. This makes it easier for us identify bottlenecks in workflows without parsing Snakemake logs.