augur.clades module
Assign clades to nodes in a tree based on amino-acid or nucleotide signatures.
Nodes which are members of a clade are stored via <OUTPUT_NODE_DATA> → nodes → <node_name> → clade_membership and if this file is used in augur export v2 these will automatically become a coloring.
The basal nodes of each clade are also given a branch label which is stored via <OUTPUT_NODE_DATA> → branches → <node_name> → labels → clade.
The keys “clade_membership” and “clade” are customisable via command line arguments.
- augur.clades.assign_clades(clade_designations, all_muts, tree, ref=None)
Ensures all nodes have an entry (or auspice doesn’t display nicely), tests each node to see if it’s the first member of a clade (this is the label), and sets the membership of each node to the value of their parent. This will change if later found to be the first member of a clade.
- Parameters:
clade_designations (dict) – clade definitions as
{clade_name:[(gene, site, allele),...]}
all_muts (dict) – mutations in each node
tree (Bio.Phylo.BaseTree.Tree) – phylogenetic tree to process
ref (str or list, optional) – reference sequence to look up state when not mutated
- Returns:
[0]: mapping of node to clade membership (where applicable) [1]: mapping of node to clade label (where applicable)
- Return type:
- augur.clades.ensure_no_multiple_mutations(all_muts)
- augur.clades.get_reference_sequence_from_root_node(all_muts, root_name)
Extracts the (nuc) sequence from the root node, if set, as well as the (aa) sequences. Returns a dictionary of {geneName: rootSequence} where rootSequence is a list and geneName may be ‘nuc’.
- augur.clades.is_node_in_clade(clade_alleles, node, root_sequence)
Determines whether a node matches the clade definition based on sequence For any condition, will first look in mutations stored in node.sequences, then check whether a reference sequence is available, and other reports ‘non-match’
- Parameters:
clade_alleles (list) – list of clade defining alleles (typically supplied from the input TSV)
node (Bio.Phylo.BaseTree.Clade) – node to check, assuming sequences (as mutations) are attached to node node.sequences specifies nucleotides/codons which are newly observed on this node i.e. they are the result of a mutation observed on the branch leading to this node
root_sequence (dict) – {geneName: observed root sequence (list)}
- Returns:
True if in clade
- Return type:
- augur.clades.parse_nodes(tree_file, node_data_files, validation_mode)
- augur.clades.read_in_clade_definitions(clade_file)
Reads in tab-seperated file that defines clades by amino acid or nucleotide mutations
Inheritance is allowed, but needs to be acyclic. Alleles can be overwritten by inheriting clades.
Sites are 1 indexed in the file, and are converted to 0 indexed in the output
Empty lines are ignored, comments after # are ignored
Format:
clade gene site alt Clade_1 ctpE 81 D Clade_2 nuc 30642 T Clade_3 nuc 444296 A Clade_3 S 1 P # Clade_4 inherits from Clade_3 Clade_4 clade Clade_3 Clade_4 pks8 634 T # Inherited allele can be overwritten Clade_4 S 1 L
- augur.clades.register_parser(parent_subparsers)
- augur.clades.run(args)
- augur.clades.warn_if_clades_not_found(membership, clade_designations)