augur ancestralļ
Infer ancestral sequences based on a tree.
The ancestral sequences are inferred using TreeTime. Each internal node gets assigned a nucleotide sequence that maximizes a likelihood on the tree given its descendants and its parent node. Each node then gets assigned a list of nucleotide mutations for any position that has a mismatch between its own sequence and its parentās sequence. The node sequences and mutations are output to a node-data JSON file.
If amino acid options are provided, the ancestral amino acid sequences for each requested gene are inferred with the same method as the nucleotide sequences described above. The inferred amino acid mutations will be included in the output node-data JSON file, with the format equivalent to the output of augur translate.
The nucleotide and amino acid sequences are inferred separately in this command, which can potentially result in mismatches between the nucleotide and amino acid mutations. If you want amino acid mutations based on the inferred nucleotide sequences, please use augur translate.
Note
The mutation positions in the node-data JSON are one-based.
usage: augur ancestral [-h] --tree TREE [--alignment ALIGNMENT]
[--vcf-reference FASTA | --root-sequence FASTA/GenBank]
[--inference {joint,marginal}]
[--keep-ambiguous | --infer-ambiguous]
[--keep-overhangs] [--annotation ANNOTATION]
[--genes GENES [GENES ...]]
[--translations TRANSLATIONS]
[--output-node-data OUTPUT_NODE_DATA]
[--output-sequences OUTPUT_SEQUENCES]
[--output-translations OUTPUT_TRANSLATIONS]
[--output-vcf OUTPUT_VCF]
[--validation-mode {error,warn,skip}]
[--skip-validation]
inputsļ
Tree and sequences to use for ancestral reconstruction
- --tree, -t
prebuilt Newick
- --alignment, -a
alignment in FASTA or VCF format
- --vcf-reference
[VCF alignment only] file of the sequence the VCF was mapped to. Differences between this sequence and the inferred root will be reported as mutations on the root branch.
- --root-sequence
[FASTA alignment only] file of the sequence that is used as root for mutation calling. Differences between this sequence and the inferred root will be reported as mutations on the root branch.
global optionsļ
Options to configure reconstruction of both nucleotide and amino acid sequences
- --inference
Possible choices: joint, marginal
calculate joint or marginal maximum likelihood ancestral sequence states
Default:
'joint'
nucleotide optionsļ
Options to configure reconstruction of ancestral nucleotide sequences
- --keep-ambiguous
do not infer nucleotides at ambiguous (N) sites on tip sequences (leave as N).
Default:
False
- --infer-ambiguous
infer nucleotides at ambiguous (N,W,R,..) sites on tip sequences and replace with most likely state.
Default:
True
- --keep-overhangs
do not infer nucleotides for gaps (-) on either side of the alignment
Default:
False
amino acid optionsļ
Options to configure reconstruction of ancestral amino acid sequences. All arguments are required for ancestral amino acid sequence reconstruction.
- --annotation
GenBank or GFF file containing the annotation
- --genes
genes to translate (list or file containing list)
- --translations
translated alignments for each CDS/Gene. Currently only supported for FASTA-input. Specify the file name via a template like āaa_sequences_%GENE.fastaā where %GENE will be replaced by the gene name.
outputsļ
Outputs supported for reconstructed ancestral sequences
- --output-node-data
name of JSON file to save mutations and ancestral sequences to
- --output-sequences
name of FASTA file to save ancestral nucleotide sequences to (FASTA alignments only)
- --output-translations
name of the FASTA file(s) to save ancestral amino acid sequences to. Specify the file name via a template like āancestral_aa_sequences_%GENE.fastaā where %GENE will be replaced bythe gene name.
- --output-vcf
name of output VCF file which will include ancestral seqs
generalļ
- --validation-mode
Possible choices: error, warn, skip
Control if optional validation checks are performed and what happens if they fail.
āerrorā and āwarnā modes perform validation and emit messages about failed validation checks. āerrorā mode causes a non-zero exit status if any validation checks failed, while āwarnā does not.
āskipā mode performs no validation.
Note that some validation checks are non-optional and as such are not affected by this setting.
Default:
error
- --skip-validation
Skip validation of input/output files, equivalent to āvalidation-mode=skip. Use at your own risk!
Example Node Data JSONļ
Hereās an example of the output node-data JSON where NODE_1
has no
mutations compared to itās parent and NODE_2
has multiple mutations.
{
"nodes": {
"NODE_1": {
"muts": [],
"sequence": "TCCAAACAAAGT..."
},
"NODE_2": {
"muts": [
"A4461G",
"A6591G",
"A9184C",
"A10385T",
"T15098C"
],
"sequence": "TCCAAACAAAGT..."
}
}
}