augur.ancestral moduleο
Infer ancestral sequences based on a tree.
The ancestral sequences are inferred using TreeTime. Each internal node gets assigned a nucleotide sequence that maximizes a likelihood on the tree given its descendants and its parent node. Each node then gets assigned a list of nucleotide mutations for any position that has a mismatch between its own sequence and its parentβs sequence. The node sequences and mutations are output to a node-data JSON file.
If amino acid options are provided, the ancestral amino acid sequences for each requested gene are inferred with the same method as the nucleotide sequences described above. The inferred amino acid mutations will be included in the output node-data JSON file, with the format equivalent to the output of augur translate.
The nucleotide and amino acid sequences are inferred separately in this command, which can potentially result in mismatches between the nucleotide and amino acid mutations. If you want amino acid mutations based on the inferred nucleotide sequences, please use augur translate.
Note
The mutation positions in the node-data JSON are one-based.
- class augur.ancestral.Ancestral_JSONο
Bases:
TypedDict- annotations: Annotations_JSONο
- mask: NotRequired[str]ο
- class augur.ancestral.Annotations_JSONο
Bases:
TypedDict- nuc: NotRequired[Nuc_Annotation]ο
- augur.ancestral.GENE_PATTERN = '%GENE'ο
String pattern used for gene replacement in filenames etc
- augur.ancestral.collect_mutations(tt, mask, reference_sequence=None, infer_ambiguous=False)ο
iterates of the tree and produces dictionaries with mutations and sequences for each node.
If a reference sequence is provided then mutations can be collected for the root node. Masked positions at the root-node will be treated specially: if we infer ambiguity, then we report no mutations (i.e. we assume the reference base holds), otherwise weβll report a mutation from the <ref> to βNβ.
- Parameters:
tt (treetime.TreeTime) -- instance of treetime with valid ancestral reconstruction
mask (numpy.ndarray(bool))
reference_sequence (str, optional)
- Returns:
dict -> <node_name> -> [mut, mut, β¦] where mut is a string in the form <from><1-based-pos><to>
- Return type:
- augur.ancestral.collect_sequences(tt, mask, reference_sequence=None, infer_ambiguous=False)ο
Create a full sequence for every node on the tree. Masked positions will have the reference base if we are inferring ambiguity, or the ambiguous character βNβ.
- Parameters:
tt (treetime.TreeTime) -- instance of treetime with valid ancestral reconstruction
mask (numpy.ndarray(bool)) -- Mask these positions by changing them to the ambiguous nucleotide
reference_sequence (str or None)
infer_ambiguous (bool, optional) -- if true, request the reconstructed sequences from treetime, otherwise retain input ambiguities
- Returns:
dict -> <node_name> -> sequence_string
- Return type:
- augur.ancestral.construct_cds_feature(name, aa_len)ο
- augur.ancestral.correct_alignment(aln_fname, correct_seq)ο
Read an alignment from a FASTA file and correct sequences using the provided correction function (from _make_seq_corrector).
Returns a MultipleSeqAlignment suitable for passing directly to TreeAnc.
- Return type:
- augur.ancestral.create_mask(is_vcf, tt, reference_sequence, aln)ο
Identify sites for which every terminal sequence is ambiguous. These sites will be masked to prevent rounding errors in the maximum likelihood inference from assigning an arbitrary nucleotide to sites at internal nodes.
- Parameters:
is_vcf (bool)
tt (treetime.TreeTime) -- instance of treetime with valid ancestral reconstruction. Unused if is_vcf.
reference_sequence (str) -- only used if is_vcf
aln (dict) -- describes variation (relative to reference) per sample. Only used if is_vcf.
- Return type:
- augur.ancestral.reconstruct_translations(anc_seqs, nuc_ref, aa_ref_fname, T, genes, annotation_fname, translations_fname_pattern, infer_ambiguous, fill_overhangs, marginal, rng_seed, output_fname_pattern, report_inconsistent_translation)ο
- Return type:
- augur.ancestral.register_parser(parent_subparsers)ο
- augur.ancestral.run(args)ο
- augur.ancestral.run_ancestral(T, aln, reference_sequence=None, is_vcf=False, full_sequences=False, fill_overhangs=False, infer_ambiguous=False, marginal=False, alphabet='nuc', rng_seed=None)ο
ancestral nucleotide reconstruction using TreeTime
- Return type:
- augur.ancestral.validate_arguments(args, genes)ο
Check that provided arguments are compatible. Where possible we use argparse built-ins, but they donβt cover everything we want to check. This checking shouldnβt be used by downstream code to assume arguments exist, however by checking for invalid combinations up-front we can exit quickly.
- Return type: