augur.frequency_estimators module

class augur.frequency_estimators.AlignmentKdeFrequencies(sigma_narrow=0.08333333333333333, sigma_wide=0.25, proportion_wide=0.2, pivot_frequency=1, start_date=None, end_date=None, pivot_interval_units='months', weights=None, weights_attribute=None, node_filters=None, max_date=None, include_internal_nodes=False, censored=False)

Bases: KdeFrequencies

KDE frequency estimator for multiple sequence alignments

Estimates frequencies for samples provided as sequences in a multiple sequence alignment and corresponding observation dates for each sequence.

estimate(alignment, observations): Estimate frequencies of bases/residues at each site in a given multiple sequence alignment based on the observation dates associated with each sequence in the alignment.

class augur.frequency_estimators.KdeFrequencies(sigma_narrow=0.08333333333333333, sigma_wide=0.25, proportion_wide=0.2, pivot_frequency=1, start_date=None, end_date=None, pivot_interval_units='months', weights=None, weights_attribute=None, node_filters=None, max_date=None, include_internal_nodes=False, censored=False)

Bases: object

Methods to estimate clade frequencies for phylogenetic trees by creating normal distributions from timestamped tips in the tree and building a kernel density estimate across discrete time points from these tip observations for each clade in the tree.

classmethod estimate_frequencies(tip_dates, pivots, normalize_to=1.0, max_date=None, **kwargs): Estimate frequencies of the given observations across the given pivots.

classmethod from_json(json_dict): Returns an instance populated with parameters and data from the given JSON dictionary.

classmethod get_densities_for_observations(observations, pivots, max_date=None, **kwargs)

Create a matrix of densities for one or more observations across the given pivots with one row per observation and one column per pivot.

Observations can be optionally filtered by a maximum date such that all densities are estimated to be zero after that date.

classmethod get_density_for_observation(mu, pivots, sigma_narrow=0.08333333333333333, sigma_wide=0.25, proportion_wide=0.2, **kwargs): Build a normal distribution centered across the given floating point date, mu, with a standard deviation based on the given sigma value and return the probability mass at each pivot. These mass values per pivot will form the input for a kernel density estimate across multiple observations.

get_params()

Returns the parameters used to define the current instance.

Returns:: parameters that define the current instance and that can be used to create a new instance
Return type:: dict

classmethod normalize_to_frequencies(density_matrix, normalize_to=1.0): Normalize the values of a given density matrix to 1 across all columns (time points) with non-zero sums. This converts kernal PDF mass into a frequency estimate.

to_json(): Returns a dictionary for the current instance that can be serialized in a JSON file.

class augur.frequency_estimators.TreeKdeFrequencies(sigma_narrow=0.08333333333333333, sigma_wide=0.25, proportion_wide=0.2, pivot_frequency=1, start_date=None, end_date=None, pivot_interval_units='months', weights=None, weights_attribute=None, node_filters=None, max_date=None, include_internal_nodes=False, censored=False)

Bases: KdeFrequencies

KDE frequency estimator for phylogenetic trees

Estimates frequencies for samples provided as annotated tips on a given tree and optionally reports clade-level frequencies based on the sum of each clade’s respective tips.

estimate(tree)

Estimate frequencies for a given tree using the parameters defined for this instance.

If weights are defined, frequencies represent a weighted mean across the values in attribute defined by self.weights_attribute.

Parameters:: tree (Bio.Phylo.BaseTree.Tree) -- annotated tree whose nodes all have an attr attribute with at least “num_date” key
Returns:: node frequencies by clade
Return type:: dict

estimate_tip_frequencies_to_proportion(tips, proportion)

Estimate frequencies for a given set of tips up to a given proportion of total frequencies.

Parameters:

tips (list) -- a list of Bio.Phylo terminals annotated with attributes in tip.attr
proportion (float) -- the proportion of the total frequency that the given tips should represent

Returns:

frequencies of given tips indexed by tip name

Return type:

dict

tip_passes_filters(tip)

Returns a boolean indicating whether a given tip passes the node filters defined for the current instance.

If no filters are defined, returns True.

Parameters:: tip (Bio.Phylo.BaseTree.Tree) -- tip from a Bio.Phylo tree annotated with attributes in tip.attr
Returns:: whether the given tip passes the defined filters or not
Return type:: bool

exception augur.frequency_estimators.TreeKdeFrequenciesError

Bases: Exception

Represents an error estimating KDE frequencies for a tree.

class augur.frequency_estimators.alignment_frequencies(aln, tps, pivots, **kwargs)

Bases: object

calculates frequencies of mutations in an alignment. uses nested frequencies for mutations at a particular site such that mutations are forced to add up to one.

calc_confidence()

calculate a crude binomial sampling confidence interval of the frequency estimate. This ignores autocorrelation of the trajectory and returns one standard deviation (68%).

Returns:: dictionary of standard deviations for each estimated trajectory
Return type:: dict

estimate_genotype_frequency(aln, gt, **kwargs)

slice an alignment at possibly multiple positions and calculate the frequency trajectory of this multi-locus genotype

Parameters:: gt (list) -- a list of (position, state) tuples specifying the genotype whose frequency is to be estimated
Returns:: frequency trajectory
Return type:: numpy.ndarray

mutation_frequencies(min_freq=0.01, include_set=None, ignore_char='')

estimate frequencies of single site mutations for each alignment column. This function populates a dictionary class.frequencies with the frequency trajectories.

Parameters:

min_freq (float, optional) -- minimal all-time frequency for an aligment column to be considered
include_set (list or set, optional) -- set of alignment column that will be used regardless of variation
ignore_char (str, optional) -- ignore this character in an alignment column (missing data)

augur.frequency_estimators.count_observations(pivots, tps)

augur.frequency_estimators.fix_freq(freq, pc)

restricts frequencies to the interval [pc, 1-pc] removes np.nan values and avoids taking logarithms of 0 or divisions by 0

Parameters:

freq (numpy.ndarray) -- frequency trajectory to be thresholded
pc (float) -- threshold value

Returns:

thresholded frequency trajectory

Return type:

numpy.ndarray

augur.frequency_estimators.float_to_datestring(numdate)

convert a numeric decimal date to a python datetime object Note that this only works for AD dates since the range of datetime objects is restricted to year>1. Copied from treetime.utils :type numdate: float :param numdate: numeric date as in 2018.23 :type numdate: float

Returns:: datetime object
Return type:: datetime.datetime

class augur.frequency_estimators.freq_est_clipped(tps, obs, pivots, dtps=None, **kwargs)

Bases: object

simple wrapper for the frequency estimator that attempts to estimate a frequency trajectory on a sensible range of pivots to avoid frequency optimization at points without data

dtps: Description

fe: Description

good_pivots: Description

good_tps: Description

obs: Description

pivot_freq: Description

pivot_lower_cutoff: Description

pivot_upper_cutoff: Description

pivots: Description

tps: Description

valid

Description

Type:: bool

learn()

class augur.frequency_estimators.frequency_estimator(tps, obs, pivots, stiffness=20.0, inertia=0.0, tol=0.001, pc=0.0001, ws=100, method='powell', **kwargs)

Bases: object

estimates a smooth frequency trajectory given a series of time stamped 0/1 observations. The most likely set of frequencies at specified pivot values is determined by numerical minimization. Likelihood consist of a bernoulli sampling term as well as a term penalizing rapid frequency shifts. this term is motivated by genetic drift, i.e., sampling variation.

initial_guess(pc=0.01)

learn(initial_guess=None)

stiffLH()

augur.frequency_estimators.get_pivots(observations, pivot_interval, start_date=None, end_date=None, pivot_interval_units='months')

Calculate pivots for a given list of floating point observation dates and interval between pivots.

Start and end pivots will be based on the range of given observed dates, unless a start or end date are provided to override these defaults. Pivots include the end date and, where possible for the given interval, the start date.

Parameters:

observations (list) -- a list of observed floating point dates per sample
pivot_interval (str) -- number of months (or weeks) between pivots
start_date (float) -- optional start of the pivots interval
end_date (float) -- optional end of the pivots interval
pivot_interval -- whether pivots are measured in “months” or in “weeks”

Returns:

pivots -- floating point pivots spanning the given the dates

Return type:

numpy.ndarray

augur.frequency_estimators.logit_inv(logit_freq, pc)

augur.frequency_estimators.logit_transform(freq, pc)

augur.frequency_estimators.make_pivots(pivots, tps)

if pivots is a scalar, make a grid of pivot points covering the entire range

Parameters:

pivots (int or iterable) -- either number of pivots (a scalar) or the actual pivots (will be cast to array and returned)
tps (numpy.ndarray) -- observation time points. Will generate pivots spanning min/max

Returns:

pivots -- array of pivot values

Return type:

numpy.ndarray

class augur.frequency_estimators.nested_frequencies(tps, obs, pivots, **kwargs)

Bases: object

estimates frequencies of mutually exclusive events such as mutations at a particular genomic position or subclades in a tree

calc_freqs()

augur.frequency_estimators.pq(p)

augur.frequency_estimators.running_average(obs, ws)

calculates a running average obs -- observations ws -- window size (number of points to average)

Parameters:

obs (list or numpy.ndarray(bool)) -- observations
ws (int) -- window size as measured in number of consecutive points

Returns:

running average of the boolean observations

Return type:

numpy.ndarray(float)

augur.frequency_estimators.test_nested_estimator()

augur.frequency_estimators.test_simple_estimator()

augur.frequency_estimators.timestamp_to_float(time)

Convert a pandas timestamp to a floating point date. This is not entirely accurate as it doesn’t account for months with different numbers of days, but should be close enough to be accurate for weekly pivots.

Examples

>>> import datetime
>>> time = datetime.date(2010, 10, 1)
>>> timestamp_to_float(time)
2010.75
>>> time = datetime.date(2011, 4, 1)
>>> timestamp_to_float(time)
2011.25
>>> timestamp_to_float(datetime.date(2011, 1, 1))
2011.0
>>> timestamp_to_float(datetime.date(2011, 12, 1)) == (2011.0 + 11.0 / 12)
True

class augur.frequency_estimators.tree_frequencies(tree, pivots, node_filter=None, min_clades=10, verbose=0, pc=0.0001, **kwargs)

Bases: object

class that estimates frequencies for nodes in the tree. each internal node is assumed to be named with an attribute clade, of root doesn’t have such an attribute, clades will be numbered in preorder. Each node is assumed to have an attribute attr with a key “num_date”.

calc_confidence()

for each frequency trajectory, calculate the bernouilli sampling error -- in reality we should look at the inverse hessian, but this is a useful approximation in most cases

Returns:: dictionary with estimated confidence intervals
Return type:: dict

estimate_clade_frequencies()

prepare()