format-dates

Format date fields to ISO 8601 dates (YYYY-MM-DD), where incomplete dates are masked with ‘XX’ (e.g. 2023 -> 2023-XX-XX).

usage: augur curate format-dates [-h] [--metadata METADATA]
                                 [--id-column ID_COLUMN]
                                 [--metadata-delimiters METADATA_DELIMITERS [METADATA_DELIMITERS ...]]
                                 [--fasta FASTA]
                                 [--seq-id-column SEQ_ID_COLUMN]
                                 [--seq-field SEQ_FIELD]
                                 [--unmatched-reporting {error_first,error_all,warn,silent}]
                                 [--duplicate-reporting {error_first,error_all,warn,silent}]
                                 [--output-metadata OUTPUT_METADATA]
                                 [--output-fasta OUTPUT_FASTA]
                                 [--output-id-field OUTPUT_ID_FIELD]
                                 [--output-seq-field OUTPUT_SEQ_FIELD]
                                 [--date-fields DATE_FIELDS [DATE_FIELDS ...]]
                                 [--expected-date-formats EXPECTED_DATE_FORMATS [EXPECTED_DATE_FORMATS ...]]
                                 [--failure-reporting {error_first,error_all,warn,silent}]
                                 [--no-mask-failure]

INPUTS

Input options shared by all augur curate commands. If no input options are provided, commands will try to read NDJSON records from stdin.

--metadata

Input metadata file. Accepts ‘-’ to read metadata from stdin.

--id-column

Name of the metadata column that contains the record identifier for reporting duplicate records. Uses the first column of the metadata file if not provided. Ignored if also providing a FASTA file input.

--metadata-delimiters

Delimiters to accept when reading a metadata file. Only one delimiter will be inferred.

Default: (‘,’, ‘t’)

--fasta

Plain or gzipped FASTA file. Headers can only contain the sequence id used to match a metadata record. Note that an index file will be generated for the FASTA file as <filename>.fasta.fxi

--seq-id-column

Name of metadata column that contains the sequence id to match sequences in the FASTA file.

--seq-field

The name to use for the sequence field when joining sequences from a FASTA file.

--unmatched-reporting

Possible choices: error_first, error_all, warn, silent

How unmatched records from combined metadata/FASTA input should be reported.

Default: error_first

--duplicate-reporting

Possible choices: error_first, error_all, warn, silent

How should duplicate records be reported.

Default: error_first

OUTPUTS

Output options shared by all augur curate commands. If no output options are provided, commands will output NDJSON records to stdout.

--output-metadata: Output metadata TSV file. Accepts ‘-’ to output TSV to stdout.
--output-fasta: Output FASTA file.
--output-id-field: The record field to use as the sequence identifier in the FASTA output.
--output-seq-field: The record field that contains the sequence for the FASTA output. This field will be deleted from the metadata output.

REQUIRED

--date-fields: List of date field names in the record that need to be standardized.
--expected-date-formats: Expected date formats that are currently in the provided date fields, defined by standard format codes as listed at https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes. If a date string matches multiple formats, it will be parsed as the first matched format in the provided order.

OPTIONAL

--failure-reporting

Possible choices: error_first, error_all, warn, silent

How should failed date formatting be reported.

Default: error_first

--no-mask-failure

Do not mask dates with ‘XXXX-XX-XX’ and return original date string if date formatting failed. (default: False)

Default: True