augur.io.sequences module

augur.io.sequences.get_biopython_format(augur_format)

Validate sequence file format and return the inferred Biopython format.

Return type:

str

augur.io.sequences.read_sequences(*paths, format='fasta')

Read sequences from one or more paths.

Automatically infer compression mode (e.g., gzip, etc.) and return a stream of sequence records given the file format.

Parameters:
  • paths (Iterable[Union[str, PathLike]]) – One or more paths to sequence files.

  • format (str) – Format of input sequences. Either “fasta” or “genbank”.

Return type:

Iterator[SeqRecord]

Returns:

Sequence records from the given path(s).

augur.io.sequences.read_single_sequence(path, format='fasta')

Read a single sequence from a path.

Automatically infers compression mode.

Parameters:
  • path (Union[str, PathLike]) – Path to a sequence file.

  • format (str) – Format of input file. Either “fasta” or “genbank”.

Return type:

SeqRecord

Returns:

A single sequence record from the given path.

augur.io.sequences.write_records_to_fasta(records, fasta, seq_id_field='strain', seq_field='sequence')

Write sequences from dict records to a fasta file. Yields the records with the seq_field dropped so that they can be consumed downstream.

Parameters:
  • records (iterable of dict) – Iterator that yields dict that contains sequences

  • fasta (str) – Path to FASTA file

  • seq_id_field (str, optional) – Field name for the sequence identifier

  • seq_field (str, optional) – Field name for the genomic sequence

Yields:

dict – A copy of the record with seq_field dropped

Raises:

AugurError – When the sequence id field or sequence field does not exist in a record

augur.io.sequences.write_sequences(sequences, path_or_buffer, format='fasta')

Write sequences to a given path in the given format.

Automatically infer compression mode (e.g., gzip, etc.) based on the path’s filename extension.

Parameters:
  • sequences (iterable of Bio.SeqRecord.SeqRecord) – A list-like collection of sequences to write

  • path_or_buffer (str or os.PathLike or io.StringIO) – A path to a file to write the given sequences in the given format.

  • format (str) – Format of input sequences matching any of those supported by BioPython (e.g., “fasta”, “genbank”, etc.)

Returns:

Number of sequences written out to the given path.

Return type:

int