augur.frequency_estimators moduleο
- class augur.frequency_estimators.AlignmentKdeFrequencies(sigma_narrow=0.08333333333333333, sigma_wide=0.25, proportion_wide=0.2, pivot_frequency=1, start_date=None, end_date=None, pivot_interval_units='months', weights=None, weights_attribute=None, node_filters=None, max_date=None, include_internal_nodes=False, censored=False)ο
Bases:
KdeFrequencies
KDE frequency estimator for multiple sequence alignments
Estimates frequencies for samples provided as sequences in a multiple sequence alignment and corresponding observation dates for each sequence.
- estimate(alignment, observations)ο
Estimate frequencies of bases/residues at each site in a given multiple sequence alignment based on the observation dates associated with each sequence in the alignment.
- class augur.frequency_estimators.KdeFrequencies(sigma_narrow=0.08333333333333333, sigma_wide=0.25, proportion_wide=0.2, pivot_frequency=1, start_date=None, end_date=None, pivot_interval_units='months', weights=None, weights_attribute=None, node_filters=None, max_date=None, include_internal_nodes=False, censored=False)ο
Bases:
object
Methods to estimate clade frequencies for phylogenetic trees by creating normal distributions from timestamped tips in the tree and building a kernel density estimate across discrete time points from these tip observations for each clade in the tree.
- classmethod estimate_frequencies(tip_dates, pivots, normalize_to=1.0, max_date=None, **kwargs)ο
Estimate frequencies of the given observations across the given pivots.
- classmethod from_json(json_dict)ο
Returns an instance populated with parameters and data from the given JSON dictionary.
- classmethod get_densities_for_observations(observations, pivots, max_date=None, **kwargs)ο
Create a matrix of densities for one or more observations across the given pivots with one row per observation and one column per pivot.
Observations can be optionally filtered by a maximum date such that all densities are estimated to be zero after that date.
- classmethod get_density_for_observation(mu, pivots, sigma_narrow=0.08333333333333333, sigma_wide=0.25, proportion_wide=0.2, **kwargs)ο
Build a normal distribution centered across the given floating point date, mu, with a standard deviation based on the given sigma value and return the probability mass at each pivot. These mass values per pivot will form the input for a kernel density estimate across multiple observations.
- get_params()ο
Returns the parameters used to define the current instance.
- Returns:
parameters that define the current instance and that can be used to create a new instance
- Return type:
- classmethod normalize_to_frequencies(density_matrix, normalize_to=1.0)ο
Normalize the values of a given density matrix to 1 across all columns (time points) with non-zero sums. This converts kernal PDF mass into a frequency estimate.
- to_json()ο
Returns a dictionary for the current instance that can be serialized in a JSON file.
- class augur.frequency_estimators.TreeKdeFrequencies(sigma_narrow=0.08333333333333333, sigma_wide=0.25, proportion_wide=0.2, pivot_frequency=1, start_date=None, end_date=None, pivot_interval_units='months', weights=None, weights_attribute=None, node_filters=None, max_date=None, include_internal_nodes=False, censored=False)ο
Bases:
KdeFrequencies
KDE frequency estimator for phylogenetic trees
Estimates frequencies for samples provided as annotated tips on a given tree and optionally reports clade-level frequencies based on the sum of each cladeβs respective tips.
- estimate(tree)ο
Estimate frequencies for a given tree using the parameters defined for this instance.
If weights are defined, frequencies represent a weighted mean across the values in attribute defined by self.weights_attribute.
- Parameters:
tree (Bio.Phylo.BaseTree.Tree) β annotated tree whose nodes all have an attr attribute with at least βnum_dateβ key
- Returns:
node frequencies by clade
- Return type:
- estimate_tip_frequencies_to_proportion(tips, proportion)ο
Estimate frequencies for a given set of tips up to a given proportion of total frequencies.
- tip_passes_filters(tip)ο
Returns a boolean indicating whether a given tip passes the node filters defined for the current instance.
If no filters are defined, returns True.
- Parameters:
tip (Bio.Phylo.BaseTree.Tree) β tip from a Bio.Phylo tree annotated with attributes in tip.attr
- Returns:
whether the given tip passes the defined filters or not
- Return type:
- exception augur.frequency_estimators.TreeKdeFrequenciesErrorο
Bases:
Exception
Represents an error estimating KDE frequencies for a tree.
- class augur.frequency_estimators.alignment_frequencies(aln, tps, pivots, **kwargs)ο
Bases:
object
calculates frequencies of mutations in an alignment. uses nested frequencies for mutations at a particular site such that mutations are forced to add up to one.
- calc_confidence()ο
calculate a crude binomial sampling confidence interval of the frequency estimate. This ignores autocorrelation of the trajectory and returns one standard deviation (68%).
- Returns:
dictionary of standard deviations for each estimated trajectory
- Return type:
- estimate_genotype_frequency(aln, gt, **kwargs)ο
slice an alignment at possibly multiple positions and calculate the frequency trajectory of this multi-locus genotype
- Parameters:
gt (list) β a list of (position, state) tuples specifying the genotype whose frequency is to be estimated
- Returns:
frequency trajectory
- Return type:
- mutation_frequencies(min_freq=0.01, include_set=None, ignore_char='')ο
estimate frequencies of single site mutations for each alignment column. This function populates a dictionary class.frequencies with the frequency trajectories.
- Parameters:
- augur.frequency_estimators.count_observations(pivots, tps)ο
- augur.frequency_estimators.fix_freq(freq, pc)ο
restricts frequencies to the interval [pc, 1-pc] removes np.nan values and avoids taking logarithms of 0 or divisions by 0
- Parameters:
freq (numpy.ndarray) β frequency trajectory to be thresholded
pc (float) β threshold value
- Returns:
thresholded frequency trajectory
- Return type:
- augur.frequency_estimators.float_to_datestring(numdate)ο
convert a numeric decimal date to a python datetime object Note that this only works for AD dates since the range of datetime objects is restricted to year>1. Copied from treetime.utils :type numdate: :param numdate: numeric date as in 2018.23 :type numdate: float
- Returns:
datetime object
- Return type:
- class augur.frequency_estimators.freq_est_clipped(tps, obs, pivots, dtps=None, **kwargs)ο
Bases:
object
simple wrapper for the frequency estimator that attempts to estimate a frequency trajectory on a sensible range of pivots to avoid frequency optimization at points without data
- dtpsο
Description
- feο
Description
- good_pivotsο
Description
- good_tpsο
Description
- obsο
Description
- pivot_freqο
Description
- pivot_lower_cutoffο
Description
- pivot_upper_cutoffο
Description
- pivotsο
Description
- tpsο
Description
- learn()ο
- class augur.frequency_estimators.frequency_estimator(tps, obs, pivots, stiffness=20.0, inertia=0.0, tol=0.001, pc=0.0001, ws=100, method='powell', **kwargs)ο
Bases:
object
estimates a smooth frequency trajectory given a series of time stamped 0/1 observations. The most likely set of frequencies at specified pivot values is determined by numerical minimization. Likelihood consist of a bernoulli sampling term as well as a term penalizing rapid frequency shifts. this term is motivated by genetic drift, i.e., sampling variation.
- initial_guess(pc=0.01)ο
- learn(initial_guess=None)ο
- stiffLH()ο
- augur.frequency_estimators.get_pivots(observations, pivot_interval, start_date=None, end_date=None, pivot_interval_units='months')ο
Calculate pivots for a given list of floating point observation dates and interval between pivots.
Start and end pivots will be based on the range of given observed dates, unless a start or end date are provided to override these defaults. Pivots include the end date and, where possible for the given interval, the start date.
- Parameters:
observations (list) β a list of observed floating point dates per sample
pivot_interval (str) β number of months (or weeks) between pivots
start_date (float) β optional start of the pivots interval
end_date (float) β optional end of the pivots interval
pivot_interval β whether pivots are measured in βmonthsβ or in βweeksβ
- Returns:
pivots β floating point pivots spanning the given the dates
- Return type:
- augur.frequency_estimators.logit_inv(logit_freq, pc)ο
- augur.frequency_estimators.logit_transform(freq, pc)ο
- augur.frequency_estimators.make_pivots(pivots, tps)ο
if pivots is a scalar, make a grid of pivot points covering the entire range
- Parameters:
pivots (int or iterable) β either number of pivots (a scalar) or the actual pivots (will be cast to array and returned)
tps (numpy.ndarray) β observation time points. Will generate pivots spanning min/max
- Returns:
pivots β array of pivot values
- Return type:
- class augur.frequency_estimators.nested_frequencies(tps, obs, pivots, **kwargs)ο
Bases:
object
estimates frequencies of mutually exclusive events such as mutations at a particular genomic position or subclades in a tree
- calc_freqs()ο
- augur.frequency_estimators.pq(p)ο
- augur.frequency_estimators.running_average(obs, ws)ο
calculates a running average obs β observations ws β window size (number of points to average)
- Parameters:
obs (list or numpy.ndarray(bool)) β observations
ws (int) β window size as measured in number of consecutive points
- Returns:
running average of the boolean observations
- Return type:
- augur.frequency_estimators.test_nested_estimator()ο
- augur.frequency_estimators.test_simple_estimator()ο
- augur.frequency_estimators.timestamp_to_float(time)ο
Convert a pandas timestamp to a floating point date. This is not entirely accurate as it doesnβt account for months with different numbers of days, but should be close enough to be accurate for weekly pivots.
Examples
>>> import datetime >>> time = datetime.date(2010, 10, 1) >>> timestamp_to_float(time) 2010.75 >>> time = datetime.date(2011, 4, 1) >>> timestamp_to_float(time) 2011.25 >>> timestamp_to_float(datetime.date(2011, 1, 1)) 2011.0 >>> timestamp_to_float(datetime.date(2011, 12, 1)) == (2011.0 + 11.0 / 12) True
- class augur.frequency_estimators.tree_frequencies(tree, pivots, node_filter=None, min_clades=10, verbose=0, pc=0.0001, **kwargs)ο
Bases:
object
class that estimates frequencies for nodes in the tree. each internal node is assumed to be named with an attribute clade, of root doesnβt have such an attribute, clades will be numbered in preorder. Each node is assumed to have an attribute attr with a key βnum_dateβ.
- calc_confidence()ο
for each frequency trajectory, calculate the bernouilli sampling error β in reality we should look at the inverse hessian, but this is a useful approximation in most cases
- Returns:
dictionary with estimated confidence intervals
- Return type:
- estimate_clade_frequencies()ο
- prepare()ο