augur.translate moduleο
Translate gene regions from nucleotides to amino acids.
Translates nucleotide sequences of nodes in a tree to amino acids for gene regions of the annotated features of the provided reference sequence. Each node then gets assigned a list of amino acid mutations for any position that has a mismatch between its own amino acid sequence and its parentβs sequence. The reference amino acid sequences, genome annotations, and node amino acid mutations are output to a node-data JSON file.
Note
The mutation positions in the node-data JSON are one-based.
- augur.translate.assign_aa_fasta(tree, translations, reference_translations)ο
- augur.translate.assign_aa_vcf(tree, translations)ο
- augur.translate.check_arg_combinations(args, is_vcf)ο
Check that provided arguments are compatible. Where possible we use argparse built-ins, but they donβt cover everything we want to check. This checking shouldnβt be used by downstream code to assume arguments exist, however by checking for invalid combinations up-front we can exit quickly.
- augur.translate.construct_mut(start, pos, end)ο
- augur.translate.register_parser(parent_subparsers)ο
- augur.translate.run(args)ο
- augur.translate.safe_translate(sequence, report_exceptions=False)ο
Returns an amino acid translation of the given nucleotide sequence accounting for gaps in the given sequence.
Optionally, returns a tuple of the translated sequence and whether an exception was raised during initial translation.
Examples
>>> safe_translate("ATG") 'M' >>> safe_translate("ATGGT-") 'MX' >>> safe_translate("ATG---") 'M-' >>> safe_translate("ATGTAG") 'M*' >>> safe_translate("") '' >>> safe_translate("ATGT") 'MX' >>> safe_translate("ATG", report_exceptions=True) ('M', False) >>> safe_translate("ATGA-G", report_exceptions=True) ('MX', True)
- augur.translate.sequences_json(node_data_json, tree, validation_mode)ο
Extract the full nuc sequence for each node in the provided node-data JSON. Returns a dict, keys are node names and values are a string of the genome sequence (nuc)
- augur.translate.sequences_vcf(reference_fasta, vcf)ο
Extract the nucleotide variation in the VCF Returns a tuple [0] The sequences as a dict of dicts. sequences β <NODE_NAME> β <POS> β <ALT_NUC> where <POS> is a 0-based int [1] The sequence of the provided reference_fasta (string)
- augur.translate.translate_feature(aln, feature)ο
Translates a subsequence of input nucleotide sequences.
- Parameters:
aln (dict) β sequences indexed by node name
feature (Bio.Seq.Seq) β BioPython sequence feature
- Returns:
translated sequences indexed by node name
- Return type:
- augur.translate.translate_vcf_feature(sequences, ref, feature, feature_name)ο
Translates a subsequence of input nucleotide sequences.
- Parameters:
sequences (dict) β TreeTime format dictionary from VCF-input of sequences indexed by node name
ref β reference alignment the VCF was mapped to
feature (Bio.Seq.Seq) β BioPython sequence feature
- Returns:
translated reference gene, positions of AA differences, and AA differences indexed by node name
- Return type:
:raises NoVariationError : if no variable sites within this feature (across all sequences):