augur.titer_model

exception augur.titer_model.InsufficientDataException

Bases: Exception

class augur.titer_model.SubstitutionModel(alignments, titers, *args, **kwargs)

Bases: TiterModel

substitution_model extends titers and implements a model that seeks to describe titer differences by sums of contributions of substitions separating the test and reference viruses. Sequences are assumed to be attached to each terminal node in the tree as node.translations

annotate_tree(tree)

Annotates antigenic advance attributes to nodes of a given tree built from the same sequences used to train the model.

Parameters

tree (Bio.Phylo) –

Returns

input tree instance with nodes annotated by per-branch and cumulative antigenic advance attributes dTiterSub and cTiterSub

Return type

Bio.Phylo

collapse_colinear_mutations(colin_thres)

find colinear columns of the design matrix, collapse them into clusters

Parameters

colin_thres (TYPE) – Description

compile_substitution_effects(cutoff=0.0001)

compile a flat json of substitution effects for visualization, prune mutation without effect

Parameters

cutoff (float, optional) – Description

Returns

Description

Return type

TYPE

determine_relevant_mutations(min_count=10)
get_mutations(strain1, strain2)

return amino acid mutations between viruses specified by strain names as tuples (HA1, F159S)

Parameters
  • strain1 (TYPE) – Description

  • strain2 (TYPE) – Description

Returns

Description

Return type

TYPE

make_seqgraph(colin_thres=5)

code amino acid differences between sequences into a matrix the matrix has dimensions #measurements x #observed mutations

Parameters

colin_thres (int, optional) – Description

predict_titer(virus, serum, cutoff=0.0)
prepare(**kwargs)
train(**kwargs)

determine the model parameters. the result will be stored in self.substitution_effect

Parameters

**kwargs – Description

class augur.titer_model.TiterCollection(titers, **kwargs)

Bases: object

Container for raw titer values and methods for analyzing these values.

static count_strains(titers)

Count test and reference virus strains in the given titers.

Parameters

titers (defaultdict) – titer measurements indexed by test, reference, and serum

Returns

number of measurements per strain

Return type

dict

>>> measurements, strains, sources = TiterCollection.load_from_file("tests/data/titer_model/h3n2_titers_subset.tsv")
>>> titer_counts = TiterCollection.count_strains(measurements)
>>> titer_counts["A/Acores/11/2013"]
6
>>> titer_counts["A/Acores/SU43/2012"]
3
>>> titer_counts["A/Cairo/63/2012"]
2
determine_autologous_titers()

scan the titer measurements for autologous (self) titers and make a dictionary stored in self to look them up later. If no autologous titer is found, use the maximum titer. This follows the rationale that test titers are generally lower than autologous titers and the highest test titer is often a reasonably approximation of the autologous titer.

static filter_strains(titers, strains)

Filter the given titers to only include values from the given strains (test or reference).

Parameters
  • titers (dict) – titer values indexed by test and reference strain and serum

  • strains (list) – names of strains to keep titers for

Returns

reduced dictionary of titer measurements containing only those were test and reference virus are part of the strain list

Return type

dict

>>> measurements, strains, sources = TiterCollection.load_from_file("tests/data/titer_model/h3n2_titers_subset.tsv")
>>> len(measurements)
11

Test the case when a test strain exists in the subset but the none of its corresponding reference strains do.

>>> len(TiterCollection.filter_strains(measurements, ["A/Acores/11/2013"]))
0

Test when both the test and reference strains exist in the subset.

>>> len(TiterCollection.filter_strains(measurements, ["A/Acores/11/2013", "A/Alabama/5/2010", "A/Athens/112/2012"]))
2
>>> len(TiterCollection.filter_strains(measurements, ["A/Acores/11/2013", "A/Acores/SU43/2012", "A/Alabama/5/2010", "A/Athens/112/2012"]))
3
>>> len(TiterCollection.filter_strains(measurements, []))
0
static load_from_file(filenames, excluded_sources=None)

Load titers from a tab-delimited file.

Parameters
  • filename (str) – tab-delimited file containing titer strains, serum, and values

  • excluded_sources (list of str) – sources in the titers file to exclude

Returns

tuple of a dict of titer measurements, list of strains, list of sources

Return type

tuple (dict, list, list)

>>> measurements, strains, sources = TiterCollection.load_from_file("tests/data/titer_model/h3n2_titers_subset.tsv")
>>> type(measurements)
<class 'dict'>
>>> len(measurements)
11
>>> measurements[("A/Acores/11/2013", ("A/Alabama/5/2010", "F27/10"))]
[80.0]
>>> len(strains)
13
>>> len(sources)
5
>>> measurements, strains, sources = TiterCollection.load_from_file("tests/data/titer_model/h3n2_titers_subset.tsv", excluded_sources=["NIMR_Sep2013_7-11.csv"])
>>> len(measurements)
5
>>> measurements.get(("A/Acores/11/2013", ("A/Alabama/5/2010", "F27/10")))
>>>
>>> output = TiterCollection.load_from_file("tests/data/titer_model/missing.tsv")
Traceback (most recent call last):
  File "<ipython-input-2-0ea96a90d45d>", line 1, in <module>
    open("tests/data/titer_model/missing.tsv", "r")
FileNotFoundError: [Errno 2] No such file or directory: 'tests/data/titer_model/missing.tsv'
normalize(ref, val)

take the log2 difference of test titers and autologous titers

Parameters
  • ref (TYPE) – Description

  • val (TYPE) – Description

Returns

Description

Return type

TYPE

normalize_titers()

convert the titer measurements into the log2 difference between the average titer measured between test virus and reference serum and the average homologous titer. all measurements relative to sera without homologous titer are excluded

read_titers(fname)
strain_census(titers)

make lists of reference viruses, test viruses and sera (there are often multiple sera per reference virus)

>>> measurements, strains, sources = TiterCollection.load_from_file("tests/data/titer_model/h3n2_titers_subset.tsv")
>>> titers = TiterCollection(measurements)
>>> sera, ref_strains, test_strains = titers.strain_census(measurements)
>>> len(sera)
9
>>> len(ref_strains)
9
>>> len(test_strains)
13
Parameters

titers (TYPE) – Description

Returns

Description

Return type

TYPE

class augur.titer_model.TiterModel(serum_Kc=0, **kwargs)

Bases: object

this class fits a linear model to titer measurements using different models that describe titer differences in a parsimonious way. Two additive models are currently implemented, the tree and the substitution model. The tree model describes titer drops as a sum of terms associated with branches in the tree, while the substitution model attributes titer drops to amino acid mutations. More details on the methods can be found in Neher et al, PNAS, 2016

assign_titers(titers, strains)
compile_potencies()

compile a json structure containing potencies for visualization we need rapid access to all sera for a given reference virus, hence the structure is organized by [ref][serum]

Returns

Description

Return type

TYPE

compile_titers()

compiles titer measurements into a json file organized by reference virus during visualization, we need the average distance of a test virus from a reference virus across sera. hence the hierarchy [ref][test][serum] NOTE: this uses node.name instead of node.clade

Returns

Description

Return type

TYPE

compile_virus_effects()

compile a json structure containing virus_effects for visualization

Returns

Description

Return type

TYPE

fit_func()
fit_l1reg()

regularize genetic parameters with an l1 norm regardless of sign

Returns

Description

Return type

TYPE

fit_nnl1reg()

l1 regularization of titer drops with non-negativity constraints

Returns

Description

Return type

TYPE

fit_nnl2reg()
fit_nnls()
make_training_set(training_fraction=1.0, subset_strains=False, **kwargs)
reference_virus_statistic()

count measurements for every reference virus and serum

titer_stats()
validate(plot=False, cutoff=0.0, validation_set=None, fname=None)

predict titers of the validation set (separate set of test_titers aside previously) and compare against known values. If requested by plot=True, a figure comparing predicted and measured titers is produced

Compute basic error metrics for actual vs. predicted titer values. Return a dictionary of {β€˜metric’: computed_metric, β€˜values’: [(actual, predicted), …]}, save a copy in self.validation

Parameters
  • plot (bool, optional) – Description

  • cutoff (float, optional) – Description

  • validation_set (None, optional) – Description

  • fname (None, optional) – Description

Returns

Description

Return type

TYPE

class augur.titer_model.TreeModel(tree, titers, *args, **kwargs)

Bases: TiterModel

tree_model extends titers and fits the antigenic differences in terms of contributions on the branches of the phylogenetic tree. nodes in the tree are decorated with attributes β€˜dTiter’ that contain the estimated titer drops across the branch

cross_validate(n, **kwargs)

For each of n iterations, randomly re-allocate titers to training and test set. Fit the model using training titers, assess performance using test titers (see TiterModel.validate) Append dictionaries of {β€˜abs_error’: , β€˜rms_error’: , β€˜values’: [(actual, predicted), …], etc.} for each iteration to the model_performance list. Return model_performance, and save a copy in self.cross_validation

Parameters
  • n (TYPE) – Description

  • **kwargs – Description

Returns

Description

Return type

TYPE

find_titer_splits(criterium=None)
walk through the tree, mark all branches that are to be included as model variables
  • no terminals

  • criterium: callable that can be used to exclude branches e.g. if

    amino acid mutations map to this branch.

Parameters

criterium (None, optional) – Description

get_path_no_terminals(v1, v2)

returns the path between two tips in the tree excluding the terminal branches.

Parameters
  • v1 (TYPE) – Description

  • v2 (TYPE) – Description

Returns

Description

Return type

TYPE

make_treegraph()

code the path between serum and test virus of each HI measurement into a matrix the matrix has dimensions #measurements x #tree branches with HI info if the path between test and serum goes through a branch, the corresponding matrix element is 1, 0 otherwise

predict_titer(virus, serum, cutoff=0.0)
prepare(**kwargs)
prepare_tree(tree)
train(**kwargs)