augur translate

Translate gene regions from nucleotides to amino acids.

Translates nucleotide sequences of nodes in a tree to amino acids for gene regions of the annotated features of the provided reference sequence. Each node then gets assigned a list of amino acid mutations for any position that has a mismatch between its own amino acid sequence and its parent’s sequence. The reference amino acid sequences, genome annotations, and node amino acid mutations are output to a node-data JSON file.


The mutation positions in the node-data JSON are one-based.

usage: augur translate [-h] --tree TREE --ancestral-sequences
                       ANCESTRAL_SEQUENCES --reference-sequence
                       REFERENCE_SEQUENCE [--genes GENES [GENES ...]]
                       [--output-node-data OUTPUT_NODE_DATA]
                       [--alignment-output ALIGNMENT_OUTPUT]
                       [--vcf-reference VCF_REFERENCE]
                       [--vcf-reference-output VCF_REFERENCE_OUTPUT]

Named Arguments


prebuilt Newick – no tree will be built if provided


JSON (fasta input) or VCF (VCF input) containing ancestral and tip sequences


GenBank or GFF file containing the annotation


genes to translate (list or file containing list)


name of JSON file to save aa-mutations to


write out translated gene alignments. If a VCF-input, a .vcf or .vcf.gz will be output here (depending on file ending). If fasta-input, specify the file name like so: ‘my_alignment_%GENE.fasta’, where ‘%GENE’ will be replaced by the name of the gene

VCF specific

These arguments are only applicable if the input (–ancestral-sequences) is in VCF format.


fasta file of the sequence the VCF was mapped to


fasta file where reference sequence translations for VCF input will be written

Example Node Data JSON

Here’s an example of the output node-data JSON where NODE_1 has no mutations compared to it’s parent and NODE_2 has multiple mutations in multiple genes.

    "annotations": {
        "GENE_1": {
            "end": 1685,
            "seqid": "",
            "start": 108,
            "strand": "+",
            "type": "CDS"
        "GENE_2": {
            "end": 2705,
            "seqid": "",
            "start": 1807,
            "strand": "+",
            "type": "CDS"
    "nodes": {
        "NODE_1": {
            "aa_muts": []
        "NODE_2": {
            "aa_muts": [
                "GENE_1": [
                "GENE_2": [
    "reference": {
        "GENE_1": "MATLLRSLAL...",
        "GENE_2": "MAEEQARHVK..."