augur ancestral

Infer ancestral sequences based on a tree.

The ancestral sequences are inferred using TreeTime. Each internal node gets assigned a nucleotide sequence that maximizes a likelihood on the tree given its descendants and its parent node. Each node then gets assigned a list of nucleotide mutations for any position that has a mismatch between its own sequence and its parent’s sequence. The node sequences and mutations are output to a node-data JSON file.

Note

The mutation positions in the node-data JSON are one-based.

usage: augur ancestral [-h] --tree TREE [--alignment ALIGNMENT]
                       [--output-node-data OUTPUT_NODE_DATA]
                       [--output-sequences OUTPUT_SEQUENCES]
                       [--inference {joint,marginal}]
                       [--vcf-reference VCF_REFERENCE]
                       [--output-vcf OUTPUT_VCF]
                       [--keep-ambiguous | --infer-ambiguous]
                       [--keep-overhangs]

Named Arguments

--tree, -t

prebuilt Newick

--alignment, -a

alignment in fasta or VCF format

--output-node-data

name of JSON file to save mutations and ancestral sequences to

--output-sequences

name of FASTA file to save ancestral sequences to (FASTA alignments only)

--inference

Possible choices: joint, marginal

calculate joint or marginal maximum likelihood ancestral sequence states

Default: “joint”

--vcf-reference

fasta file of the sequence the VCF was mapped to (only used if a VCF is provided as the alignment)

--output-vcf

name of output VCF file which will include ancestral seqs

--keep-ambiguous

do not infer nucleotides at ambiguous (N) sites on tip sequences (leave as N).

Default: False

--infer-ambiguous

infer nucleotides at ambiguous (N,W,R,..) sites on tip sequences and replace with most likely state.

Default: True

--keep-overhangs

do not infer nucleotides for gaps (-) on either side of the alignment

Default: False

Example Node Data JSON

Here’s an example of the output node-data JSON where NODE_1 has no mutations compared to it’s parent and NODE_2 has multiple mutations.

{
    "nodes": {
        "NODE_1": {
            "muts": [],
            "sequence": "TCCAAACAAAGT..."
        },
        "NODE_2": {
            "muts": [
              "A4461G",
              "A6591G",
              "A9184C",
              "A10385T",
              "T15098C"
            ],
            "sequence": "TCCAAACAAAGT..."
        }
    }
}