augur.sequence_traits moduleļ
Annotate sequences based on amino-acid or nucleotide signatures.
- augur.sequence_traits.annotate_strains(all_features, all_sequences)ļ
Looks for DRM mutations which match in position and alt base in the translated protein dict
- augur.sequence_traits.annotate_strains_by_gene(annotations, features, sequences, gene='nuc')ļ
Sort through all potential features and link them up with mutations to produce an annotation
- augur.sequence_traits.attach_features(annotations, label, count)ļ
āAttachesā features to nodes and lists the corresponding mutations as values, that is:
{nodename:{āResistance 1ā:āmut1,mut2ā, āResistance 2ā:āmut1ā}}
- Parameters:
annotations (dict) -- annotations fo stgrains as globed together by annotate_strains
label (str) -- label of the feature set as specified by as command line argument
count (str) -- if equal to traits, will count the number of distinct features that occur in the annotation, otherwise will count the total number of mutations
- Returns:
json/dict to export
- Return type:
- augur.sequence_traits.read_in_features(drm_file)ļ
Reads in and stores position, alt base/AA, feature, gene, and ādisplay nameā (optional) of mutations such as drug-resistance mutations
Format to map by both nucleotide and AA sites:
GENE
SITE
ALT
DISPLAY_NAME
FEATURE
gyrB
461
N
Fluoroquinolones
nuc
1472358
T
rrs: C513T
Streptomycin
nuc
1673425
T
fabG1: C-15T
Isoniazid Ethionamide
ethA
175
T
Ethionamide
Format to map by AA site:
GENE
SITE
ALT
FEATURE
gyrB
461
N
Fluoroquinolones
gyrB
499
D
Fluoroquinolones
rpoB
170
F
Rifampicin
rpoB
359
A
Rifampicin
Format to map by nucleotide site:
SITE
ALT
DISPLAY_NAME
FEATURE
6505
T
D461N
Fluoroquinolones
6505
C
D461N
Fluoroquinolones
760314
T
V170F
Rifampicin
760882
C
V359A
Rifampicin
Or to map by nucleotide site and display mutations:
SITE
ALT
FEATURE
6505
T
Fluoroquinolones
6505
C
Fluoroquinolones
760314
T
Rifampicin
760882
C
Rifampicin
- augur.sequence_traits.read_in_translate_vcf(vcf_file, ref_file)ļ
Reads in a vcf file where TRANSLATIONS have been stored and associated reference sequence fasta (to which the VCF file is mapped) This is the file output by āwrite_VCF_translationā below
Very simple compared to the above as will never be insertion or deletion
Returns a nested dict in the same format as is input in āwrite_VCF_translationā below, with a nested dict for each gene, which contains āsequencesā, āpositionsā, and āreferenceā
- augur.sequence_traits.register_parser(parent_subparsers)ļ
- augur.sequence_traits.run(args)ļ
This should be modified to work on Fasta-input files!!