augur.mask module

Mask specified sites from a VCF or FASTA file.

augur.mask.get_chrom_name(vcf_file)

Read the CHROM field from the first non-header line of a vcf file.

Returns: str or None: Either the CHROM field or None if no non-comment line could be found.

augur.mask.mask_fasta(mask_sites, in_file, out_file, mask_from_beginning=0, mask_from_end=0, mask_invalid=False)

Mask the provided site list from a FASTA file and write to a new file.

Masked sites are overwritten as “N”s.

mask_sites: list[int]

A list of site indexes to exclude from the FASTA.

in_file: str

The path to the FASTA file you wish to mask.

out_file: str

The path to write the resulting FASTA to

mask_from_beginning: int

Number of sites to mask from the beginning of each sequence (default 0)

mask_from_end: int

Number of sites to mask from the end of each sequence (default 0)

mask_invalid: bool

Mask invalid nucleotides (default False)

augur.mask.mask_vcf(mask_sites, in_file, out_file, cleanup=True)

Mask the provided site list from a VCF file and write to a new file.

This function relies on ‘vcftools –exclude-positions’ to mask the requested sites.

mask_sites: list[int]

A list of site indexes to exclude from the vcf.

in_file: str

The path to the vcf file you wish to mask.

out_file: str

The path to write the resulting vcf to

cleanup: bool

Clean up the intermediate files, including the VCFTools log and mask sites file

augur.mask.register_arguments(parser)
augur.mask.run(args)

Mask specified sites from the VCF or FASTA.

For VCF files, his occurs by removing them entirely from the VCF, essentially making them identical to the reference at the locations.

For FASTA files, masked sites are replaced with “N”.

If users don’t specify output, will overwrite the input file.