augur.sequence_traits

Annotate sequences based on amino-acid or nucleotide signatures.

augur.sequence_traits.annotate_strains(all_features, all_sequences)

Looks for DRM mutations which match in position and alt base in the translated protein dict

Parameters:
  • all_features (dict) – dict of all features in all genes, will be processed gene by gene

  • all_sequences (dict) – sequence dict of all genes

Returns:

annotations based on the strains for each feature

Return type:

dict

augur.sequence_traits.annotate_strains_by_gene(annotations, features, sequences, gene='nuc')

Sort through all potential features and link them up with mutations to produce an annotation

Parameters:
  • annotations (dict) – dictionary of sequence features as read in by read_in_features. This is modified in place

  • features (dict) – dictionary of features in one gene

  • sequences (dict) – sequences of that gene

  • gene (str, optional) – name of the gene

augur.sequence_traits.attach_features(annotations, label, count)

‘Attaches’ features to nodes and lists the corresponding mutations as values, that is:

{nodename:{“Resistance 1”:”mut1,mut2”, “Resistance 2”:”mut1”}}

Parameters:
  • annotations (dict) – annotations fo stgrains as globed together by annotate_strains

  • label (str) – label of the feature set as specified by as command line argument

  • count (str) – if equal to traits, will count the number of distinct features that occur in the annotation, otherwise will count the total number of mutations

Returns:

json/dict to export

Return type:

dict

augur.sequence_traits.read_in_features(drm_file)

Reads in and stores position, alt base/AA, feature, gene, and ‘display name’ (optional) of mutations such as drug-resistance mutations

Format to map by both nucleotide and AA sites:

GENE

SITE

ALT

DISPLAY_NAME

FEATURE

gyrB

461

N

Fluoroquinolones

nuc

1472358

T

rrs: C513T

Streptomycin

nuc

1673425

T

fabG1: C-15T

Isoniazid Ethionamide

ethA

175

T

Ethionamide

Format to map by AA site:

GENE

SITE

ALT

FEATURE

gyrB

461

N

Fluoroquinolones

gyrB

499

D

Fluoroquinolones

rpoB

170

F

Rifampicin

rpoB

359

A

Rifampicin

Format to map by nucleotide site:

SITE

ALT

DISPLAY_NAME

FEATURE

6505

T

D461N

Fluoroquinolones

6505

C

D461N

Fluoroquinolones

760314

T

V170F

Rifampicin

760882

C

V359A

Rifampicin

Or to map by nucleotide site and display mutations:

SITE

ALT

FEATURE

6505

T

Fluoroquinolones

6505

C

Fluoroquinolones

760314

T

Rifampicin

760882

C

Rifampicin

Parameters:

drm_file (str) – file defining sequence features to be used for annotations

Returns:

dict of dict with sequence features index by gene name, position, and character state

Return type:

dict

augur.sequence_traits.read_in_translate_vcf(vcf_file, ref_file)

Reads in a vcf file where TRANSLATIONS have been stored and associated reference sequence fasta (to which the VCF file is mapped) This is the file output by “write_VCF_translation” below

Very simple compared to the above as will never be insertion or deletion

Returns a nested dict in the same format as is input in “write_VCF_translation” below, with a nested dict for each gene, which contains ‘sequences’, ‘positions’, and ‘reference’

Parameters:
  • vcf_file (str) – name of the vcf file to be read, can be gzipped

  • ref_file (str) – name of the fasta file with the reference sequence

Returns:

dictionary of dictionaries with mutations of each strain for each sequence relative to the reference

Return type:

dict

augur.sequence_traits.register_parser(parent_subparsers)
augur.sequence_traits.run(args)

This should be modified to work on Fasta-input files!!