format-dates
Format date fields to ISO 8601 dates (YYYY-MM-DD).
If the provided --expected-date-formats
represent incomplete dates then
the incomplete dates are masked with ‘XX’. For example, providing
%Y
will allow year only dates to be formatted as 2023-XX-XX
.
usage: augur curate format-dates [-h] [--metadata METADATA]
[--id-column ID_COLUMN]
[--metadata-delimiters METADATA_DELIMITERS [METADATA_DELIMITERS ...]]
[--fasta FASTA]
[--seq-id-column SEQ_ID_COLUMN]
[--seq-field SEQ_FIELD]
[--unmatched-reporting {error_first,error_all,warn,silent}]
[--duplicate-reporting {error_first,error_all,warn,silent}]
[--output-metadata OUTPUT_METADATA]
[--output-fasta OUTPUT_FASTA]
[--output-id-field OUTPUT_ID_FIELD]
[--output-seq-field OUTPUT_SEQ_FIELD]
[--date-fields DATE_FIELDS [DATE_FIELDS ...]]
[--expected-date-formats EXPECTED_DATE_FORMATS [EXPECTED_DATE_FORMATS ...]]
[--failure-reporting {error_first,error_all,warn,silent}]
[--no-mask-failure]
INPUTS
Input options shared by all augur curate commands. If no input options are provided, commands will try to read NDJSON records from stdin.
- --metadata
Input metadata file. May be plain text (TSV, CSV) or an Excel or OpenOffice spreadsheet workbook file. When an Excel or OpenOffice workbook, only the first visible worksheet will be read and initial empty rows/columns will be ignored. Accepts ‘-’ to read plain text from stdin.
- --id-column
Name of the metadata column that contains the record identifier for reporting duplicate records. Uses the first column of the metadata file if not provided. Ignored if also providing a FASTA file input.
- --metadata-delimiters
Delimiters to accept when reading a plain text metadata file. Only one delimiter will be inferred.
Default:
(',', '\t')
- --fasta
Plain or gzipped FASTA file. Headers can only contain the sequence id used to match a metadata record. Note that an index file will be generated for the FASTA file as <filename>.fasta.fxi
- --seq-id-column
Name of metadata column that contains the sequence id to match sequences in the FASTA file.
- --seq-field
The name to use for the sequence field when joining sequences from a FASTA file.
- --unmatched-reporting
Possible choices: error_first, error_all, warn, silent
How unmatched records from combined metadata/FASTA input should be reported.
Default:
error_first
- --duplicate-reporting
Possible choices: error_first, error_all, warn, silent
How should duplicate records be reported.
Default:
error_first
OUTPUTS
Output options shared by all augur curate commands. If no output options are provided, commands will output NDJSON records to stdout.
- --output-metadata
Output metadata TSV file. Accepts ‘-’ to output TSV to stdout.
- --output-fasta
Output FASTA file.
- --output-id-field
The record field to use as the sequence identifier in the FASTA output.
- --output-seq-field
The record field that contains the sequence for the FASTA output. This field will be deleted from the metadata output.
REQUIRED
- --date-fields
List of date field names in the record that need to be standardized.
OPTIONAL
- --expected-date-formats
Expected date formats that are currently in the provided date fields, defined by standard format codes as listed at https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes. If a date string matches multiple formats, it will be parsed as the first matched format in the provided order.
Default:
['%Y-%m-%d', '%Y-%m-XX', '%Y-XX-XX', 'XXXX-XX-XX']
- --failure-reporting
Possible choices: error_first, error_all, warn, silent
How should failed date formatting be reported.
Default:
error_first
- --no-mask-failure
Do not mask dates with ‘XXXX-XX-XX’ and return original date string if date formatting failed. (default: False)
Default:
True