augur.align module

Align multiple sequences from FASTA.

exception augur.align.AlignmentError

Bases: Exception

augur.align.analyse_insertions(aln, ungapped, insertion_csv)
augur.align.check_arguments(args)
augur.align.check_duplicates(*values)
augur.align.ensure_reference_strain_present(ref_name, existing_alignment, seqs)
augur.align.generate_alignment_cmd(method, nthreads, existing_aln_fname, seqs_to_align_fname, aln_fname, log_fname)
augur.align.make_gaps_ambiguous(aln)

replace all gaps by β€˜N’ in all sequences in the alignment. TreeTime will treat them as fully ambiguous and replace then with the most likely state. This modifies the alignment in place.

Parameters

aln (MultipleSeqAlign) – Biopython Alignment

augur.align.postprocess(output_file, ref_name, keep_reference, fill_gaps)

Postprocessing of the combined alignment file.

Parameters
  • output_file (str) – The file the new alignment was written to

  • ref_name (str) – If provided, the name of the reference strain used in the alignment

  • keep_reference (bool) – If the reference was provided, whether it should be kept in the alignment

  • fill_gaps (bool) – Replace all gaps in the alignment with β€œN” to indicate ambiguous sites.

Return type

None - the modified alignment is written directly to output_file

augur.align.prepare(sequences, existing_aln_fname, output, ref_name, ref_seq_fname)

Prepare the sequences, existing alignment, and reference sequence for alignment.

This function:
  1. Combines all given input sequences into a single file

  2. Checks to make sure the input sequences don’t overlap with the existing alignment, if one exists.

  3. If given a reference name, check that sequence exists in either the existing alignment, if given, or the input sequences.

  4. If given a reference sequence, either add it to the existing alignment or prepend it to the input seqeunces.

  5. Write the input sequences to a single file, and write the alignment back out if we added the reference sequence to it.

Parameters
  • sequences (list[str]) – List of paths to FASTA-formatted sequences to align.

  • existing_aln_fname (str) – Path of an existing alignment to use, or None

  • output (str) – Path the aligned sequences will be written out to.

  • ref_name (str) – The name of the reference sequence, if provided

  • ref_seq_fname (str) – The path to the reference sequence file. If this is provided, it overrides ref_name.

Returns

tuple

Return type

The existing alignment filename, the new sequences filename, and the name of the reference sequence.

augur.align.prettify_alignment(aln)

Converts all bases to uppercase and removes auto reverse-complement prefix (_R_). This modifies the alignment in place.

Parameters

aln (MultipleSeqAlign) – Biopython Alignment

augur.align.prune_seqs_matching_alignment(seqs, aln)

Return a set of seqs excluding those already in the alignment & print a warning message for each sequence which is exluded.

augur.align.read_alignment(fname)
augur.align.read_reference(ref_fname)
augur.align.read_sequences(*fnames)

return list of sequences from all fnames

augur.align.register_arguments(parser)

Add arguments to parser. Kept as a separate function than register_parser to continue to support unit tests that use this function to create argparser.

augur.align.register_parser(parent_subparsers)
augur.align.remove_reference_sequence(seqs, reference_name)
augur.align.run(args)
Parameters

args (namespace) – arguments passed in via the command-line from augur

Returns

returns 0 for success, 1 for general error

Return type

int

augur.align.strip_non_reference(aln, reference, insertion_csv=None)

return sequences that have all insertions relative to the reference removed. The aligment is returned as list of sequences.

Parameters
  • aln (MultipleSeqAlign) – Biopython Alignment

  • reference (str) – name of reference sequence, assumed to be part of the alignment

Returns

list of trimmed sequences, effectively a multiple alignment

Return type

list

Tests

>>> [s.name for s in strip_non_reference(read_alignment("tests/data/align/test_aligned_sequences.fasta"), "with_gaps")]
Trimmed gaps in with_gaps from the alignment
['with_gaps', 'no_gaps', 'some_other_seq', '_R_crick_strand']
>>> [s.name for s in strip_non_reference(read_alignment("tests/data/align/test_aligned_sequences.fasta"), "no_gaps")]
No gaps in alignment to trim (with respect to the reference, no_gaps)
['with_gaps', 'no_gaps', 'some_other_seq', '_R_crick_strand']
>>> [s.name for s in strip_non_reference(read_alignment("tests/data/align/test_aligned_sequences.fasta"), "missing")]
Traceback (most recent call last):
  ...
augur.align.AlignmentError: ERROR: reference missing not found in alignment
augur.align.write_seqs(seqs, fname)

A wrapper around SeqIO.write with error handling