augur.curate.normalize_strings module

Normalize strings to a Unicode normalization form and strip leading and trailing whitespaces.

Strings need to be normalized for predictable string comparisons, especially in cases where strings contain diacritics (see https://unicode.org/faq/normalization.html).

augur.curate.normalize_strings.normalize_strings(record, form='NFC')

Normalizes string values in record to a Unicode normalization form and removes leading and trailing whitespaces from string. Uses NFC normalization form by default.

Parameters:
  • records (dict) – An input record to be normalized

  • form (str, optional) – An optional Unicode normalization form

Yields:

record (dict) – The modified record that is a shallow copy of the original record

augur.curate.normalize_strings.register_parser(parent_subparsers)
augur.curate.normalize_strings.run(args, records)