augur.mask

Mask specified sites from a VCF or FASTA file.

augur.mask.get_chrom_name(vcf_file)

Read the CHROM field from the first non-header line of a vcf file.

Returns: str or None: Either the CHROM field or None if no non-comment line could be found.

augur.mask.mask_fasta(mask_sites, in_file, out_file, mask_from_beginning=0, mask_from_end=0, mask_invalid=False)

Mask the provided site list from a FASTA file and write to a new file.

Masked sites are overwritten as “N”s.

Parameters:
  • mask_sites (list of int) – A list of site indexes to exclude from the FASTA.

  • in_file (str) – The path to the FASTA file you wish to mask.

  • out_file (str) – The path to write the resulting FASTA to

  • mask_from_beginning (int) – Number of sites to mask from the beginning of each sequence (default 0)

  • mask_from_end (int) – Number of sites to mask from the end of each sequence (default 0)

  • mask_invalid (bool) – Mask invalid nucleotides (default False)

augur.mask.mask_sequence(sequence, mask_sites, mask_from_beginning, mask_from_end, mask_invalid)

Mask characters at the given sites in a single sequence record, modifying the record in place.

Parameters:
  • sequence (Bio.SeqRecord.SeqRecord) – A sequence to be masked

  • mask_sites (list of int) – A list of site indexes to exclude from the FASTA.

  • mask_from_beginning (int) – Number of sites to mask from the beginning of each sequence (default 0)

  • mask_from_end (int) – Number of sites to mask from the end of each sequence (default 0)

  • mask_invalid (bool) – Mask invalid nucleotides (default False)

Returns:

Masked sequence in its original record object

Return type:

Bio.SeqRecord.SeqRecord

augur.mask.mask_vcf(mask_sites, in_file, out_file, cleanup=True)

Mask the provided site list from a VCF file and write to a new file.

This function relies on ‘vcftools –exclude-positions’ to mask the requested sites.

Parameters:
  • mask_sites (list of int) – A list of site indexes to exclude from the vcf.

  • in_file (str) – The path to the vcf file you wish to mask.

  • out_file (str) – The path to write the resulting vcf to

  • cleanup (bool) – Clean up the intermediate files, including the VCFTools log and mask sites file

augur.mask.register_arguments(parser)

Add arguments to parser. Kept as a separate function than register_parser to continue to support unit tests that use this function to create argparser.

augur.mask.register_parser(parent_subparsers)
augur.mask.run(args)

Mask specified sites from the VCF or FASTA.

For VCF files, his occurs by removing them entirely from the VCF, essentially making them identical to the reference at the locations.

For FASTA files, masked sites are replaced with “N”.

If users don’t specify output, will overwrite the input file.