augur.sequence_traits

Annotate sequences based on amino-acid or nucleotide signatures.

augur.sequence_traits.annotate_strains(all_features, all_sequences)

Looks for DRM mutations which match in position and alt base in the translated protein dict

Parameters:

all_features (dict) – dict of all features in all genes, will be processed gene by gene
all_sequences (dict) – sequence dict of all genes

Returns:

annotations based on the strains for each feature

Return type:

dict

augur.sequence_traits.annotate_strains_by_gene(annotations, features, sequences, gene='nuc')

Sort through all potential features and link them up with mutations to produce an annotation

Parameters:

annotations (dict) – dictionary of sequence features as read in by read_in_features. This is modified in place
features (dict) – dictionary of features in one gene
sequences (dict) – sequences of that gene
gene (str, optional) – name of the gene

augur.sequence_traits.attach_features(annotations, label, count)

‘Attaches’ features to nodes and lists the corresponding mutations as values, that is:

{nodename:{“Resistance 1”:”mut1,mut2”, “Resistance 2”:”mut1”}}

Parameters:

annotations (dict) – annotations fo stgrains as globed together by annotate_strains
label (str) – label of the feature set as specified by as command line argument
count (str) – if equal to traits, will count the number of distinct features that occur in the annotation, otherwise will count the total number of mutations

Returns:

json/dict to export

Return type:

dict

augur.sequence_traits.read_in_features(drm_file)

Reads in and stores position, alt base/AA, feature, gene, and ‘display name’ (optional) of mutations such as drug-resistance mutations

Format to map by both nucleotide and AA sites:

GENE	SITE	ALT	DISPLAY_NAME	FEATURE
gyrB	461	N		Fluoroquinolones
nuc	1472358	T	rrs: C513T	Streptomycin
nuc	1673425	T	fabG1: C-15T	Isoniazid Ethionamide
ethA	175	T		Ethionamide

Format to map by AA site:

GENE	SITE	ALT	FEATURE
gyrB	461	N	Fluoroquinolones
gyrB	499	D	Fluoroquinolones
rpoB	170	F	Rifampicin
rpoB	359	A	Rifampicin

Format to map by nucleotide site:

SITE	ALT	DISPLAY_NAME	FEATURE
6505	T	D461N	Fluoroquinolones
6505	C	D461N	Fluoroquinolones
760314	T	V170F	Rifampicin
760882	C	V359A	Rifampicin

Or to map by nucleotide site and display mutations:

SITE	ALT	FEATURE
6505	T	Fluoroquinolones
6505	C	Fluoroquinolones
760314	T	Rifampicin
760882	C	Rifampicin

Parameters:: drm_file (str) – file defining sequence features to be used for annotations
Returns:: dict of dict with sequence features index by gene name, position, and character state
Return type:: dict

augur.sequence_traits.read_in_translate_vcf(vcf_file, ref_file)

Reads in a vcf file where TRANSLATIONS have been stored and associated reference sequence fasta (to which the VCF file is mapped) This is the file output by “write_VCF_translation” below

Very simple compared to the above as will never be insertion or deletion

Returns a nested dict in the same format as is input in “write_VCF_translation” below, with a nested dict for each gene, which contains ‘sequences’, ‘positions’, and ‘reference’

Parameters:

vcf_file (str) – name of the vcf file to be read, can be gzipped
ref_file (str) – name of the fasta file with the reference sequence

Returns:

dictionary of dictionaries with mutations of each strain for each sequence relative to the reference

Return type:

dict

augur.sequence_traits.register_parser(parent_subparsers)

augur.sequence_traits.run(args): This should be modified to work on Fasta-input files!!