augur.clades

Assign clades to nodes in a tree based on amino-acid or nucleotide signatures.

Nodes which are members of a clade are stored via <OUTPUT_NODE_DATA> → nodes → <node_name> → clade_membership and if this file is used in augur export v2 these will automatically become a coloring.

The basal nodes of each clade are also given a branch label which is stored via <OUTPUT_NODE_DATA> → branches → <node_name> → labels → clade.

The keys “clade_membership” and “clade” are customisable via command line arguments.

augur.clades.assign_clades(clade_designations, all_muts, tree, ref=None)

Ensures all nodes have an entry (or auspice doesn’t display nicely), tests each node to see if it’s the first member of a clade (this is the label), and sets the membership of each node to the value of their parent. This will change if later found to be the first member of a clade.

Parameters:
  • clade_designations (dict) – clade definitions as {clade_name:[(gene, site, allele),...]}

  • all_muts (dict) – mutations in each node

  • tree (Bio.Phylo.BaseTree.Tree) – phylogenetic tree to process

  • ref (str or list, optional) – reference sequence to look up state when not mutated

Returns:

[0]: mapping of node to clade membership (where applicable) [1]: mapping of node to clade label (where applicable)

Return type:

(dict, dict)

augur.clades.ensure_no_multiple_mutations(all_muts)
augur.clades.get_reference_sequence_from_root_node(all_muts, root_name)

Extracts the (nuc) sequence from the root node, if set, as well as the (aa) sequences. Returns a dictionary of {geneName: rootSequence} where rootSequence is a list and geneName may be ‘nuc’.

augur.clades.is_node_in_clade(clade_alleles, node, root_sequence)

Determines whether a node matches the clade definition based on sequence For any condition, will first look in mutations stored in node.sequences, then check whether a reference sequence is available, and other reports ‘non-match’

Parameters:
  • clade_alleles (list) – list of clade defining alleles (typically supplied from the input TSV)

  • node (Bio.Phylo.BaseTree.Clade) – node to check, assuming sequences (as mutations) are attached to node node.sequences specifies nucleotides/codons which are newly observed on this node i.e. they are the result of a mutation observed on the branch leading to this node

  • root_sequence (dict) – {geneName: observed root sequence (list)}

Returns:

True if in clade

Return type:

bool

augur.clades.parse_nodes(tree_file, node_data_files)
augur.clades.read_in_clade_definitions(clade_file)

Reads in tab-seperated file that defines clades by amino acid or nucleotide mutations

Inheritance is allowed, but needs to be acyclic. Alleles can be overwritten by inheriting clades.

Sites are 1 indexed in the file, and are converted to 0 indexed in the output

Empty lines are ignored, comments after # are ignored

Format:

clade      gene    site     alt
Clade_1    ctpE    81       D
Clade_2    nuc     30642    T
Clade_3    nuc     444296   A
Clade_3    S       1        P
# Clade_4 inherits from Clade_3
Clade_4    clade   Clade_3
Clade_4    pks8    634      T
# Inherited allele can be overwritten
Clade_4    S       1        L
Parameters:

clade_file (str) – meta data file

Returns:

clade definitions as {clade_name:[(gene, site, allele),...]}

Return type:

dict

augur.clades.register_parser(parent_subparsers)
augur.clades.run(args)
augur.clades.warn_if_clades_not_found(membership, clade_designations)