augur.sequence_traits
Annotate sequences based on amino-acid or nucleotide signatures.
- augur.sequence_traits.annotate_strains(all_features, all_sequences)
Looks for DRM mutations which match in position and alt base in the translated protein dict
- Parameters
all_features (dict) – dict of all features in all genes, will be processed gene by gene
all_sequences (dict) – sequence dict of all genes
- Returns
annotations based on the strains for each feature
- Return type
dict
- augur.sequence_traits.annotate_strains_by_gene(annotations, features, sequences, gene='nuc')
Sort through all potential features and link them up with mutations to produce an annotation
- Parameters
annotations (dict) – dictionary of sequence features as read in by read_in_features. This is modified in place
features (dict) – dictionary of features in one gene
sequences (dict) – sequences of that gene
gene (str, optional) – name of the gene
- augur.sequence_traits.attach_features(annotations, label, count)
‘Attaches’ features to nodes and lists the corresponding mutations as values, that is:
{nodename:{“Resistance 1”:”mut1,mut2”, “Resistance 2”:”mut1”}}
- Parameters
annotations (dict) – annotations fo stgrains as globed together by annotate_strains
label (label) – label of the feature set as specified by as command line argument
count (str) – if equal to traits, will count the number of distinct features that occur in the annotation, otherwise will count the total number of mutations
- Returns
json/dict to export
- Return type
dict
- augur.sequence_traits.read_in_features(drm_file)
Reads in and stores position, alt base/AA, feature, gene, and ‘display name’ (optional) of mutations such as drug-resistance mutations
Format to map by both nucleotide and AA sites:
GENE
SITE
ALT
DISPLAY_NAME
FEATURE
gyrB
461
N
Fluoroquinolones
nuc
1472358
T
rrs: C513T
Streptomycin
nuc
1673425
T
fabG1: C-15T
Isoniazid Ethionamide
ethA
175
T
Ethionamide
Format to map by AA site:
GENE
SITE
ALT
FEATURE
gyrB
461
N
Fluoroquinolones
gyrB
499
D
Fluoroquinolones
rpoB
170
F
Rifampicin
rpoB
359
A
Rifampicin
Format to map by nucleotide site:
SITE
ALT
DISPLAY_NAME
FEATURE
6505
T
D461N
Fluoroquinolones
6505
C
D461N
Fluoroquinolones
760314
T
V170F
Rifampicin
760882
C
V359A
Rifampicin
Or to map by nucleotide site and display mutations:
SITE
ALT
FEATURE
6505
T
Fluoroquinolones
6505
C
Fluoroquinolones
760314
T
Rifampicin
760882
C
Rifampicin
- Parameters
drm_file (str) – file defining sequence features to be used for annotations
- Returns
dict of dict with sequence features index by gene name, position, and character state
- Return type
dict
- augur.sequence_traits.read_in_translate_vcf(vcf_file, ref_file)
Reads in a vcf file where TRANSLATIONS have been stored and associated reference sequence fasta (to which the VCF file is mapped) This is the file output by “write_VCF_translation” below
Very simple compared to the above as will never be insertion or deletion
Returns a nested dict in the same format as is input in “write_VCF_translation” below, with a nested dict for each gene, which contains ‘sequences’, ‘positions’, and ‘reference’
- Parameters
vcf_file (str) – name of the vcf file to be read, can be gzipped
ref_file (str) – name of the fasta file with the reference sequence
- Returns
dictionary of dictionaries with mutations of each strain for each sequence relative to the reference
- Return type
dict
- augur.sequence_traits.register_parser(parent_subparsers)
- augur.sequence_traits.run(args)
This should be modified to work on Fasta-input files!!