augur.titer_model

exception augur.titer_model.InsufficientDataException: Bases: Exception

class augur.titer_model.SubstitutionModel(alignments, titers, *args, **kwargs)

Bases: TiterModel

substitution_model extends titers and implements a model that seeks to describe titer differences by sums of contributions of substitions separating the test and reference viruses. Sequences are assumed to be attached to each terminal node in the tree as node.translations

annotate_tree(tree)

Annotates antigenic advance attributes to nodes of a given tree built from the same sequences used to train the model.

Parameters:: tree (Bio.Phylo.BaseTree.Tree)
Returns:: input tree instance with nodes annotated by per-branch and cumulative antigenic advance attributes dTiterSub and cTiterSub
Return type:: Bio.Phylo.BaseTree.Tree

collapse_colinear_mutations(colin_thres)

find colinear columns of the design matrix, collapse them into clusters

Parameters:: colin_thres

compile_substitution_effects(cutoff=0.0001)

compile a flat json of substitution effects for visualization, prune mutation without effect

Parameters:: cutoff (float, optional)

determine_relevant_mutations(min_count=10)

get_mutations(strain1, strain2)

return amino acid mutations between viruses specified by strain names as tuples (HA1, F159S)

Parameters:

strain1
strain2

make_seqgraph(colin_thres=5)

code amino acid differences between sequences into a matrix the matrix has dimensions #measurements x #observed mutations

Parameters:: colin_thres (int, optional)

predict_titer(virus, serum, cutoff=0.0)

prepare(**kwargs)

train(**kwargs)

determine the model parameters. the result will be stored in self.substitution_effect

Parameters:: **kwargs

class augur.titer_model.TiterCollection(titers, **kwargs)

Bases: object

Container for raw titer values and methods for analyzing these values.

static count_strains(titers)

Count test and reference virus strains in the given titers.

Parameters:: titers (collections.defaultdict) – titer measurements indexed by test, reference, and serum
Returns:: number of measurements per strain
Return type:: dict

Examples

>>> measurements, strains, sources = TiterCollection.load_from_file("tests/data/titer_model/h3n2_titers_subset.tsv")
>>> titer_counts = TiterCollection.count_strains(measurements)
>>> titer_counts["A/Acores/11/2013"]
6
>>> titer_counts["A/Acores/SU43/2012"]
3
>>> titer_counts["A/Cairo/63/2012"]
2

determine_autologous_titers(): scan the titer measurements for autologous (self) titers and make a dictionary stored in self to look them up later. If no autologous titer is found, use the maximum titer. This follows the rationale that test titers are generally lower than autologous titers and the highest test titer is often a reasonably approximation of the autologous titer.

static filter_strains(titers, strains)

Filter the given titers to only include values from the given strains (test or reference).

Parameters:

titers (dict) – titer values indexed by test and reference strain and serum
strains (list) – names of strains to keep titers for

Returns:

reduced dictionary of titer measurements containing only those were test and reference virus are part of the strain list

Return type:

dict

Examples

>>> measurements, strains, sources = TiterCollection.load_from_file("tests/data/titer_model/h3n2_titers_subset.tsv")
>>> len(measurements)
11

Test the case when a test strain exists in the subset but the none of its corresponding reference strains do.

>>> len(TiterCollection.filter_strains(measurements, ["A/Acores/11/2013"]))
0

Test when both the test and reference strains exist in the subset.

>>> len(TiterCollection.filter_strains(measurements, ["A/Acores/11/2013", "A/Alabama/5/2010", "A/Athens/112/2012"]))
2
>>> len(TiterCollection.filter_strains(measurements, ["A/Acores/11/2013", "A/Acores/SU43/2012", "A/Alabama/5/2010", "A/Athens/112/2012"]))
3
>>> len(TiterCollection.filter_strains(measurements, []))
0

static load_from_file(filenames, excluded_sources=None)

Load titers from a tab-delimited file.

Parameters:

filename (str) – tab-delimited file containing titer strains, serum, and values
excluded_sources (list of str) – sources in the titers file to exclude

Returns:

tuple of a dict of titer measurements, list of strains, list of sources

Return type:

tuple

Examples

>>> measurements, strains, sources = TiterCollection.load_from_file("tests/data/titer_model/h3n2_titers_subset.tsv")
>>> type(measurements)
<class 'dict'>
>>> len(measurements)
11
>>> len(strains)
13
>>> len(sources)
5

Inspect specific measurements. First, inspect a measurement that has a specific value in the input.

>>> measurements[("A/Acores/11/2013", ("A/Alabama/5/2010", "F27/10"))]
[80.0]

Next, inspect a measurement that has a thresholded value at the lower bound of detection (e.g., “<80”). This measurement should be reported as one half of its threshold value (e.g., 40.0).

>>> measurements[("A/Acores/11/2013", ("A/Victoria/208/2009", "F7/10"))]
[40.0]

Inspect a measurement that has a thresholded value at the upper bound of detection (“>1280”). This measurement should be reported as twice its threshold value (e.g., 2560.0).

>>> measurements[("A/Acores/SU43/2012", ("A/Texas/50/2012", "F36/12"))]
[2560.0]

Confirm that excluding sources produces fewer measurements.

>>> measurements, strains, sources = TiterCollection.load_from_file("tests/data/titer_model/h3n2_titers_subset.tsv", excluded_sources=["NIMR_Sep2013_7-11.csv"])
>>> len(measurements)
5

Request measurements for a test/reference/serum tuple that should not exist after excluding its source.

>>> measurements.get(("A/Acores/11/2013", ("A/Alabama/5/2010", "F27/10")))
>>>

Missing titer data should produce an error.

>>> output = TiterCollection.load_from_file("tests/data/titer_model/missing.tsv")
Traceback (most recent call last):
  File "<ipython-input-2-0ea96a90d45d>", line 1, in <module>
    open("tests/data/titer_model/missing.tsv", "r")
FileNotFoundError: [Errno 2] No such file or directory: 'tests/data/titer_model/missing.tsv'

normalize(ref, val)

take the log2 difference of test titers and autologous titers

Parameters:

ref
val

normalize_titers(): convert the titer measurements into the log2 difference between the average titer measured between test virus and reference serum and the average homologous titer. all measurements relative to sera without homologous titer are excluded

read_titers(fname)

strain_census(titers)

make lists of reference viruses, test viruses and sera (there are often multiple sera per reference virus)

Examples

>>> measurements, strains, sources = TiterCollection.load_from_file("tests/data/titer_model/h3n2_titers_subset.tsv")
>>> titers = TiterCollection(measurements)
>>> sera, ref_strains, test_strains = titers.strain_census(measurements)
>>> len(sera)
9
>>> len(ref_strains)
9
>>> len(test_strains)
13

Parameters:: titers

class augur.titer_model.TiterModel(serum_Kc=0, **kwargs)

Bases: object

this class fits a linear model to titer measurements using different models that describe titer differences in a parsimonious way. Two additive models are currently implemented, the tree and the substitution model. The tree model describes titer drops as a sum of terms associated with branches in the tree, while the substitution model attributes titer drops to amino acid mutations. More details on the methods can be found in Neher et al, PNAS, 2016

assign_titers(titers, strains)

compile_potencies(): compile a json structure containing potencies for visualization we need rapid access to all sera for a given reference virus, hence the structure is organized by [ref][serum]

compile_titers(): compiles titer measurements into a json file organized by reference virus during visualization, we need the average distance of a test virus from a reference virus across sera. hence the hierarchy [ref][test][serum] NOTE: this uses node.name instead of node.clade

compile_virus_effects(): compile a json structure containing virus_effects for visualization

fit_func()

fit_l1reg(): regularize genetic parameters with an l1 norm regardless of sign

fit_nnl1reg(): l1 regularization of titer drops with non-negativity constraints

fit_nnl2reg()

fit_nnls()

make_training_set(training_fraction=1.0, subset_strains=False, **kwargs)

reference_virus_statistic(): count measurements for every reference virus and serum

titer_stats()

validate(plot=False, cutoff=0.0, validation_set=None, fname=None)

predict titers of the validation set (separate set of test_titers aside previously) and compare against known values. If requested by plot=True, a figure comparing predicted and measured titers is produced

Compute basic error metrics for actual vs. predicted titer values. Return a dictionary of {‘metric’: computed_metric, ‘values’: [(actual, predicted), …]}, save a copy in self.validation

Parameters:

plot (bool, optional)
cutoff (float, optional)
validation_set (None, optional)
fname (None, optional)

class augur.titer_model.TreeModel(tree, titers, *args, **kwargs)

Bases: TiterModel

tree_model extends titers and fits the antigenic differences in terms of contributions on the branches of the phylogenetic tree. nodes in the tree are decorated with attributes ‘dTiter’ that contain the estimated titer drops across the branch

cross_validate(n, **kwargs)

For each of n iterations, randomly re-allocate titers to training and test set. Fit the model using training titers, assess performance using test titers (see TiterModel.validate) Append dictionaries of {‘abs_error’: , ‘rms_error’: , ‘values’: [(actual, predicted), …], etc.} for each iteration to the model_performance list. Return model_performance, and save a copy in self.cross_validation

Parameters:

n
**kwargs

find_titer_splits(criterium=None)

walk through the tree, mark all branches that are to be included as model variables

no terminals
criterium: callable that can be used to exclude branches e.g. if
amino acid mutations map to this branch.

Parameters:: criterium (None, optional)

get_path_no_terminals(v1, v2)

returns the path between two tips in the tree excluding the terminal branches.

Parameters:

v1
v2

make_treegraph(): code the path between serum and test virus of each HI measurement into a matrix the matrix has dimensions #measurements x #tree branches with HI info if the path between test and serum goes through a branch, the corresponding matrix element is 1, 0 otherwise

predict_titer(virus, serum, cutoff=0.0)

prepare(**kwargs)

prepare_tree(tree)

train(**kwargs)