augur translate

Translate gene regions from nucleotides to amino acids.

Translates nucleotide sequences of nodes in a tree to amino acids for gene regions of the annotated features of the provided reference sequence. Each node then gets assigned a list of amino acid mutations for any position that has a mismatch between its own amino acid sequence and its parent’s sequence. The reference amino acid sequences, genome annotations, and node amino acid mutations are output to a node-data JSON file.

Note

The mutation positions in the node-data JSON are one-based.

usage: augur translate [-h] --tree TREE --ancestral-sequences
                       ANCESTRAL_SEQUENCES --reference-sequence
                       REFERENCE_SEQUENCE [--genes GENES [GENES ...]]
                       [--output-node-data OUTPUT_NODE_DATA]
                       [--alignment-output ALIGNMENT_OUTPUT]
                       [--validation-mode {error,warn,skip}]
                       [--skip-validation] [--vcf-reference VCF_REFERENCE]
                       [--vcf-reference-output VCF_REFERENCE_OUTPUT]

Named Arguments 

--tree

prebuilt Newick -- no tree will be built if provided

--ancestral-sequences

JSON (fasta input) or VCF (VCF input) containing ancestral and tip sequences

--reference-sequence

GenBank or GFF file containing the annotation

--genes

genes to translate (list or file containing list)

--output-node-data

name of JSON file to save aa-mutations to

--alignment-output

write out translated gene alignments. If a VCF-input, a .vcf or .vcf.gz will be output here (depending on file ending). If fasta-input, specify the file name like so: ‘my_alignment_%GENE.fasta’, where ‘%GENE’ will be replaced by the name of the gene

--validation-mode

Possible choices: error, warn, skip

Control if optional validation checks are performed and what happens if they fail.

‘error’ and ‘warn’ modes perform validation and emit messages about failed validation checks. ‘error’ mode causes a non-zero exit status if any validation checks failed, while ‘warn’ does not.

‘skip’ mode performs no validation.

Note that some validation checks are non-optional and as such are not affected by this setting.

Default: error

--skip-validation

Skip validation of input/output files, equivalent to --validation-mode=skip. Use at your own risk!

VCF specific 

These arguments are only applicable if the input (--ancestral-sequences) is in VCF format.

--vcf-reference: fasta file of the sequence the VCF was mapped to
--vcf-reference-output: fasta file where reference sequence translations for VCF input will be written

Example Node Data JSON 

Here’s an example of the output node-data JSON where NODE_1 has no mutations compared to it’s parent and NODE_2 has multiple mutations in multiple genes.

{
    "annotations": {
        "GENE_1": {
            "end": 1685,
            "seqid": "reference.gb",
            "start": 108,
            "strand": "+",
            "type": "CDS"
        },
        "GENE_2": {
            "end": 2705,
            "seqid": "reference.gb",
            "start": 1807,
            "strand": "+",
            "type": "CDS"
        },
    },
    "nodes": {
        "NODE_1": {
            "aa_muts": []
        },
        "NODE_2": {
            "aa_muts": [
                "GENE_1": [
                    "S139N",
                    "R213K",
                    "R439G",
                    "V440A",
                    "D474N",
                    "S479W",
                    "S481T",
                    "P485L",
                    "R521K"
                ],
                "GENE_2": [
                    "P43S",
                    "D46N",
                    "C64R",
                    "R98K",
                    "D136G",
                    "M175V"
                ]
            ]
        }
    },
    "reference": {
        "GENE_1": "MATLLRSLAL...",
        "GENE_2": "MAEEQARHVK..."
    }
}

augur translate

Named Arguments

VCF specific

Example Node Data JSON

Named Arguments 

VCF specific 

Example Node Data JSON 