augur.translate module

Translate gene regions from nucleotides to amino acids.

Translates nucleotide sequences of nodes in a tree to amino acids for gene regions of the annotated features of the provided reference sequence. Each node then gets assigned a list of amino acid mutations for any position that has a mismatch between its own amino acid sequence and its parent’s sequence. The reference amino acid sequences, genome annotations, and node amino acid mutations are output to a node-data JSON file.

Note

The mutation positions in the node-data JSON are one-based.

exception augur.translate.MismatchNodeError

Bases: Exception

exception augur.translate.MissingNodeError

Bases: Exception

exception augur.translate.NoVariationError

Bases: Exception

augur.translate.assign_aa_fasta(tree, translations, reference_translations)
augur.translate.assign_aa_vcf(tree, translations)
augur.translate.check_arg_combinations(args, is_vcf)

Check that provided arguments are compatible. Where possible we use argparse built-ins, but they don’t cover everything we want to check. This checking shouldn’t be used by downstream code to assume arguments exist, however by checking for invalid combinations up-front we can exit quickly.

augur.translate.construct_mut(start, pos, end)
augur.translate.register_parser(parent_subparsers)
augur.translate.run(args)
augur.translate.safe_translate(sequence, report_exceptions=False)

Returns an amino acid translation of the given nucleotide sequence accounting for gaps in the given sequence.

Optionally, returns a tuple of the translated sequence and whether an exception was raised during initial translation.

Examples

>>> safe_translate("ATG")
'M'
>>> safe_translate("ATGGT-")
'MX'
>>> safe_translate("ATG---")
'M-'
>>> safe_translate("ATGTAG")
'M*'
>>> safe_translate("")
''
>>> safe_translate("ATGT")
'MX'
>>> safe_translate("ATG", report_exceptions=True)
('M', False)
>>> safe_translate("ATGA-G", report_exceptions=True)
('MX', True)
augur.translate.sequences_json(node_data_json, tree, validation_mode)

Extract the full nuc sequence for each node in the provided node-data JSON. Returns a dict, keys are node names and values are a string of the genome sequence (nuc)

augur.translate.sequences_vcf(reference_fasta, vcf)

Extract the nucleotide variation in the VCF Returns a tuple [0] The sequences as a dict of dicts. sequences β†’ <NODE_NAME> β†’ <POS> β†’ <ALT_NUC> where <POS> is a 0-based int [1] The sequence of the provided reference_fasta (string)

augur.translate.translate_feature(aln, feature)

Translates a subsequence of input nucleotide sequences.

Parameters:
  • aln (dict) – sequences indexed by node name

  • feature (Bio.Seq.Seq) – BioPython sequence feature

Returns:

translated sequences indexed by node name

Return type:

dict

augur.translate.translate_vcf_feature(sequences, ref, feature, feature_name)

Translates a subsequence of input nucleotide sequences.

Parameters:
  • sequences (dict) – TreeTime format dictionary from VCF-input of sequences indexed by node name

  • ref – reference alignment the VCF was mapped to

  • feature (Bio.Seq.Seq) – BioPython sequence feature

Returns:

translated reference gene, positions of AA differences, and AA differences indexed by node name

Return type:

dict

:raises NoVariationError : if no variable sites within this feature (across all sequences):