augur.translate module

Translate gene regions from nucleotides to amino acids.

Translates nucleotide sequences of nodes in a tree to amino acids for gene regions of the annotated features of the provided reference sequence. Each node then gets assigned a list of amino acid mutations for any position that has a mismatch between its own amino acid sequence and its parent’s sequence. The reference amino acid sequences, genome annotations, and node amino acid mutations are output to a node-data JSON file.

Note

The mutation positions in the node-data JSON are one-based.

exception augur.translate.MismatchNodeError: Bases: Exception

exception augur.translate.MissingNodeError: Bases: Exception

exception augur.translate.NoVariationError: Bases: Exception

augur.translate.assign_aa_fasta(tree, translations, reference_translations)

augur.translate.assign_aa_vcf(tree, translations)

augur.translate.check_arg_combinations(args, is_vcf): Check that provided arguments are compatible. Where possible we use argparse built-ins, but they don’t cover everything we want to check. This checking shouldn’t be used by downstream code to assume arguments exist, however by checking for invalid combinations up-front we can exit quickly.

augur.translate.construct_mut(start, pos, end)

augur.translate.register_parser(parent_subparsers)

augur.translate.run(args)

augur.translate.safe_translate(sequence)

Returns an amino acid translation of the given nucleotide sequence accounting for gaps in the given sequence.

Examples

>>> safe_translate("ATG")
'M'
>>> safe_translate("ATGGT-")
'MX'
>>> safe_translate("ATG---")
'M-'
>>> safe_translate("ATGTAG")
'M*'
>>> safe_translate("")
''
>>> safe_translate("ATGT")
Traceback (most recent call last):
...
ValueError: Sequence length 4 is not divisible by 3.

augur.translate.sequences_json(node_data_json, tree, validation_mode): Extract the full nuc sequence for each node in the provided node-data JSON. Returns a dict, keys are node names and values are a string of the genome sequence (nuc)

augur.translate.sequences_vcf(reference_fasta, vcf): Extract the nucleotide variation in the VCF Returns a tuple [0] The sequences as a dict of dicts. sequences → <NODE_NAME> → <POS> → <ALT_NUC> where <POS> is a 0-based int [1] The sequence of the provided reference_fasta (string)

augur.translate.translate_feature(aln, feature)

Translates a subsequence of input nucleotide sequences.

Parameters:

aln (dict) -- sequences indexed by node name
feature (Bio.Seq.Seq) -- BioPython sequence feature

Returns:

translated sequences indexed by node name

Return type:

dict

augur.translate.translate_vcf_feature(sequences, ref, feature)

Translates a subsequence of input nucleotide sequences.

Parameters:

sequences (dict) -- TreeTime format dictionary from VCF-input of sequences indexed by node name
ref -- reference alignment the VCF was mapped to
feature (Bio.Seq.Seq) -- BioPython sequence feature

Returns:

translated reference gene, positions of AA differences, and AA differences indexed by node name

Return type:

dict

:raises NoVariationError : if no variable sites within this feature (across all sequences):