augur.utils

exception augur.utils.InvalidTreeError

Bases: Exception

Represents an error loading a phylogenetic tree from a filename.

augur.utils.annotate_parents_for_tree(tree)

Annotate each node in the given tree with its parent.

>>> import io
>>> tree = Bio.Phylo.read(io.StringIO("(A, (B, C))"), "newick")
>>> not any([hasattr(node, "parent") for node in tree.find_clades()])
True
>>> tree = annotate_parents_for_tree(tree)
>>> tree.root.parent is None
True
>>> all([hasattr(node, "parent") for node in tree.find_clades()])
True
augur.utils.available_cpu_cores(fallback: int = 1) int

Returns the number (an int) of CPU cores available to this process, if determinable, otherwise the number of CPU cores available to the computer, if determinable, otherwise the fallback number (which defaults to 1).

augur.utils.first_line(text)

Returns the first line of the given text, ignoring leading and trailing whitespace.

augur.utils.get_augur_version()

Returns a string of the current augur version.

augur.utils.get_json_name(args, default=None)
augur.utils.get_parent_name_by_child_name_for_tree(tree)

Return dictionary mapping child node names to parent node names

augur.utils.json_to_tree(json_dict, root=True, parent_cumulative_branch_length=None)

Returns a Bio.Phylo tree corresponding to the given JSON dictionary exported by tree_to_json.

Assigns links back to parent nodes for the root of the tree.

Test opening a JSON from augur export v1.

>>> import json
>>> json_fh = open("tests/data/json_tree_to_nexus/flu_h3n2_ha_3y_tree.json", "r")
>>> json_dict = json.load(json_fh)
>>> tree = json_to_tree(json_dict)
>>> tree.name
'NODE_0002020'
>>> len(tree.clades)
2
>>> tree.clades[0].name
'NODE_0001489'
>>> hasattr(tree, "attr")
True
>>> "dTiter" in tree.attr
True
>>> tree.clades[0].parent.name
'NODE_0002020'
>>> tree.clades[0].branch_length > 0
True

Test opening a JSON from augur export v2.

>>> json_fh = open("tests/data/zika.json", "r")
>>> json_dict = json.load(json_fh)
>>> tree = json_to_tree(json_dict)
>>> hasattr(tree, "name")
True
>>> len(tree.clades) > 0
True
>>> tree.clades[0].branch_length > 0
True

Branch lengths should be the length of the branch to each node and not the length from the root. The cumulative branch length from the root gets its own attribute.

>>> tip = [tip for tip in tree.find_clades(terminal=True) if tip.name == "USA/2016/FLWB042"][0]
>>> round(tip.cumulative_branch_length, 6)
0.004747
>>> round(tip.branch_length, 6)
0.000186
augur.utils.load_features(reference, feature_names=None)
augur.utils.load_mask_sites(mask_file)

Load masking sites from either a BED file or a masking file.

Parameters

mask_file (str) – Path to the BED or masking file

Returns

Sorted list of unique zero-indexed sites

Return type

list[int]

augur.utils.nthreads_value(value)

Argument value validation and casting function for –nthreads.

augur.utils.read_bed_file(bed_file)

Read a BED file and return a list of excluded sites.

Note: This function assumes the given file is a BED file. On parsing failures, it will attempt to skip the first line and retry, but no other error checking is attempted. Incorrectly formatted files will raise errors.

Parameters

bed_file (str) – Path to the BED file

Returns

Sorted list of unique zero-indexed sites

Return type

list[int]

augur.utils.read_colors(overrides=None, use_defaults=True)
augur.utils.read_config(fname)
augur.utils.read_lat_longs(overrides=None, use_defaults=True)
augur.utils.read_mask_file(mask_file)

Read a masking file and return a list of excluded sites.

Masking files have a single masking site per line, either alone or as the second column of a tab-separated file. These sites are assumed to be one-indexed, NOT zero-indexed. Incorrectly formatted lines will be skipped.

Parameters

mask_file (str) – Path to the masking file

Returns

Sorted list of unique zero-indexed sites

Return type

list[int]

augur.utils.read_node_data(fnames, tree=None, skip_validation=False)
augur.utils.read_strains(*files, comment_char='#')

Reads strain names from one or more plain text files and returns the set of distinct strains.

Strain names can be commented with full-line or inline comments. For example, the following is a valid strain names file:

# this is a comment at the top of the file
strain1  # exclude strain1 because it isn't sequenced properly
strain2
  # this is an empty line that will be ignored.
Parameters

files (one or more str) – one or more names of text files with one strain name per line

Returns

strain names from the given input files

Return type

set

augur.utils.read_tree(fname, min_terminals=3)

Safely load a tree from a given filename or raise an error if the file does not contain a valid tree.

Parameters
  • fname (str) – name of a file containing a phylogenetic tree

  • min_terminals (int) – minimum number of terminals required for the parsed tree as a sanity check on the tree

Raises

InvalidTreeError – If the given file exists but does not seem to contain a valid tree format.

Returns

BioPython tree instance

Return type

Bio.Phylo

augur.utils.write_json(data, file_name, indent=2, include_version=True)

Write data as JSON to the given file_name, creating parent directories if necessary. The augur version is included as a top-level key β€œaugur_version”.

Parameters
  • data (dict) – data to write out to JSON

  • file_name (str) – file name to write to

  • indent (int or None, optional) – JSON indentation level. Default is None if the environment variable AUGUR_MINIFY_JSON is truthy, else 1

  • include_version (bool, optional) – Include the augur version. Default: True.

Raises

OSError –