augur.titer_model moduleο
- class augur.titer_model.SubstitutionModel(alignments, titers, *args, **kwargs)ο
Bases:
TiterModel
substitution_model extends titers and implements a model that seeks to describe titer differences by sums of contributions of substitions separating the test and reference viruses. Sequences are assumed to be attached to each terminal node in the tree as node.translations
- annotate_tree(tree)ο
Annotates antigenic advance attributes to nodes of a given tree built from the same sequences used to train the model.
- Parameters:
tree (Bio.Phylo.BaseTree.Tree)
- Returns:
input tree instance with nodes annotated by per-branch and cumulative antigenic advance attributes dTiterSub and cTiterSub
- Return type:
- collapse_colinear_mutations(colin_thres)ο
find colinear columns of the design matrix, collapse them into clusters
- Parameters:
colin_thres
- compile_substitution_effects(cutoff=0.0001)ο
compile a flat json of substitution effects for visualization, prune mutation without effect
- Parameters:
cutoff (float, optional)
- determine_relevant_mutations(min_count=10)ο
- get_mutations(strain1, strain2)ο
return amino acid mutations between viruses specified by strain names as tuples (HA1, F159S)
- Parameters:
strain1
strain2
- make_seqgraph(colin_thres=5)ο
code amino acid differences between sequences into a matrix the matrix has dimensions #measurements x #observed mutations
- Parameters:
colin_thres (int, optional)
- predict_titer(virus, serum, cutoff=0.0)ο
- prepare(**kwargs)ο
- train(**kwargs)ο
determine the model parameters. the result will be stored in self.substitution_effect
- Parameters:
**kwargs
- class augur.titer_model.TiterCollection(titers, **kwargs)ο
Bases:
object
Container for raw titer values and methods for analyzing these values.
- static count_strains(titers)ο
Count test and reference virus strains in the given titers.
- Parameters:
titers (collections.defaultdict) β titer measurements indexed by test, reference, and serum
- Returns:
number of measurements per strain
- Return type:
Examples
>>> measurements, strains, sources = TiterCollection.load_from_file("tests/data/titer_model/h3n2_titers_subset.tsv") >>> titer_counts = TiterCollection.count_strains(measurements) >>> titer_counts["A/Acores/11/2013"] 6 >>> titer_counts["A/Acores/SU43/2012"] 3 >>> titer_counts["A/Cairo/63/2012"] 2
- determine_autologous_titers()ο
scan the titer measurements for autologous (self) titers and make a dictionary stored in self to look them up later. If no autologous titer is found, use the maximum titer. This follows the rationale that test titers are generally lower than autologous titers and the highest test titer is often a reasonably approximation of the autologous titer.
- static filter_strains(titers, strains)ο
Filter the given titers to only include values from the given strains (test or reference).
- Parameters:
- Returns:
reduced dictionary of titer measurements containing only those were test and reference virus are part of the strain list
- Return type:
Examples
>>> measurements, strains, sources = TiterCollection.load_from_file("tests/data/titer_model/h3n2_titers_subset.tsv") >>> len(measurements) 11
Test the case when a test strain exists in the subset but the none of its corresponding reference strains do.
>>> len(TiterCollection.filter_strains(measurements, ["A/Acores/11/2013"])) 0
Test when both the test and reference strains exist in the subset.
>>> len(TiterCollection.filter_strains(measurements, ["A/Acores/11/2013", "A/Alabama/5/2010", "A/Athens/112/2012"])) 2 >>> len(TiterCollection.filter_strains(measurements, ["A/Acores/11/2013", "A/Acores/SU43/2012", "A/Alabama/5/2010", "A/Athens/112/2012"])) 3 >>> len(TiterCollection.filter_strains(measurements, [])) 0
- static load_from_file(filenames, excluded_sources=None)ο
Load titers from a tab-delimited file.
- Parameters:
- Returns:
tuple of a dict of titer measurements, list of strains, list of sources
- Return type:
Examples
>>> measurements, strains, sources = TiterCollection.load_from_file("tests/data/titer_model/h3n2_titers_subset.tsv") >>> type(measurements) <class 'dict'> >>> len(measurements) 11 >>> len(strains) 13 >>> len(sources) 5
Inspect specific measurements. First, inspect a measurement that has a specific value in the input.
>>> measurements[("A/Acores/11/2013", ("A/Alabama/5/2010", "F27/10"))] [80.0]
Next, inspect a measurement that has a thresholded value at the lower bound of detection (e.g., β<80β). This measurement should be reported as one half of its threshold value (e.g., 40.0).
>>> measurements[("A/Acores/11/2013", ("A/Victoria/208/2009", "F7/10"))] [40.0]
Inspect a measurement that has a thresholded value at the upper bound of detection (β>1280β). This measurement should be reported as twice its threshold value (e.g., 2560.0).
>>> measurements[("A/Acores/SU43/2012", ("A/Texas/50/2012", "F36/12"))] [2560.0]
Confirm that excluding sources produces fewer measurements.
>>> measurements, strains, sources = TiterCollection.load_from_file("tests/data/titer_model/h3n2_titers_subset.tsv", excluded_sources=["NIMR_Sep2013_7-11.csv"]) >>> len(measurements) 5
Request measurements for a test/reference/serum tuple that should not exist after excluding its source.
>>> measurements.get(("A/Acores/11/2013", ("A/Alabama/5/2010", "F27/10"))) >>>
Missing titer data should produce an error.
>>> output = TiterCollection.load_from_file("tests/data/titer_model/missing.tsv") Traceback (most recent call last): File "<ipython-input-2-0ea96a90d45d>", line 1, in <module> open("tests/data/titer_model/missing.tsv", "r") FileNotFoundError: [Errno 2] No such file or directory: 'tests/data/titer_model/missing.tsv'
- normalize(ref, val)ο
take the log2 difference of test titers and autologous titers
- Parameters:
ref
val
- normalize_titers()ο
convert the titer measurements into the log2 difference between the average titer measured between test virus and reference serum and the average homologous titer. all measurements relative to sera without homologous titer are excluded
- read_titers(fname)ο
- strain_census(titers)ο
make lists of reference viruses, test viruses and sera (there are often multiple sera per reference virus)
Examples
>>> measurements, strains, sources = TiterCollection.load_from_file("tests/data/titer_model/h3n2_titers_subset.tsv") >>> titers = TiterCollection(measurements) >>> sera, ref_strains, test_strains = titers.strain_census(measurements) >>> len(sera) 9 >>> len(ref_strains) 9 >>> len(test_strains) 13
- Parameters:
titers
- class augur.titer_model.TiterModel(serum_Kc=0, **kwargs)ο
Bases:
object
this class fits a linear model to titer measurements using different models that describe titer differences in a parsimonious way. Two additive models are currently implemented, the tree and the substitution model. The tree model describes titer drops as a sum of terms associated with branches in the tree, while the substitution model attributes titer drops to amino acid mutations. More details on the methods can be found in Neher et al, PNAS, 2016
- assign_titers(titers, strains)ο
- compile_potencies()ο
compile a json structure containing potencies for visualization we need rapid access to all sera for a given reference virus, hence the structure is organized by [ref][serum]
- compile_titers()ο
compiles titer measurements into a json file organized by reference virus during visualization, we need the average distance of a test virus from a reference virus across sera. hence the hierarchy [ref][test][serum] NOTE: this uses node.name instead of node.clade
- compile_virus_effects()ο
compile a json structure containing virus_effects for visualization
- fit_func()ο
- fit_l1reg()ο
regularize genetic parameters with an l1 norm regardless of sign
- fit_nnl1reg()ο
l1 regularization of titer drops with non-negativity constraints
- fit_nnl2reg()ο
- fit_nnls()ο
- make_training_set(training_fraction=1.0, subset_strains=False, **kwargs)ο
- reference_virus_statistic()ο
count measurements for every reference virus and serum
- titer_stats()ο
- validate(plot=False, cutoff=0.0, validation_set=None, fname=None)ο
predict titers of the validation set (separate set of test_titers aside previously) and compare against known values. If requested by plot=True, a figure comparing predicted and measured titers is produced
Compute basic error metrics for actual vs. predicted titer values. Return a dictionary of {βmetricβ: computed_metric, βvaluesβ: [(actual, predicted), β¦]}, save a copy in self.validation
- class augur.titer_model.TreeModel(tree, titers, *args, **kwargs)ο
Bases:
TiterModel
tree_model extends titers and fits the antigenic differences in terms of contributions on the branches of the phylogenetic tree. nodes in the tree are decorated with attributes βdTiterβ that contain the estimated titer drops across the branch
- cross_validate(n, **kwargs)ο
For each of n iterations, randomly re-allocate titers to training and test set. Fit the model using training titers, assess performance using test titers (see TiterModel.validate) Append dictionaries of {βabs_errorβ: , βrms_errorβ: , βvaluesβ: [(actual, predicted), β¦], etc.} for each iteration to the model_performance list. Return model_performance, and save a copy in self.cross_validation
- Parameters:
n
**kwargs
- find_titer_splits(criterium=None)ο
- walk through the tree, mark all branches that are to be included as model variables
no terminals
- criterium: callable that can be used to exclude branches e.g. if
amino acid mutations map to this branch.
- Parameters:
criterium (None, optional)
- get_path_no_terminals(v1, v2)ο
returns the path between two tips in the tree excluding the terminal branches.
- Parameters:
v1
v2
- make_treegraph()ο
code the path between serum and test virus of each HI measurement into a matrix the matrix has dimensions #measurements x #tree branches with HI info if the path between test and serum goes through a branch, the corresponding matrix element is 1, 0 otherwise
- predict_titer(virus, serum, cutoff=0.0)ο
- prepare(**kwargs)ο
- prepare_tree(tree)ο
- train(**kwargs)ο