apply-geolocation-rules

Applies user curated geolocation rules to the geolocation fields.

usage: augur curate apply-geolocation-rules [-h] [--metadata METADATA]
                                            [--id-column ID_COLUMN]
                                            [--metadata-delimiters METADATA_DELIMITERS [METADATA_DELIMITERS ...]]
                                            [--fasta FASTA]
                                            [--seq-id-column SEQ_ID_COLUMN]
                                            [--seq-field SEQ_FIELD]
                                            [--unmatched-reporting {error_first,error_all,warn,silent}]
                                            [--duplicate-reporting {error_first,error_all,warn,silent}]
                                            [--output-metadata OUTPUT_METADATA]
                                            [--output-fasta OUTPUT_FASTA]
                                            [--output-id-field OUTPUT_ID_FIELD]
                                            [--output-seq-field OUTPUT_SEQ_FIELD]
                                            [--region-field REGION_FIELD]
                                            [--country-field COUNTRY_FIELD]
                                            [--division-field DIVISION_FIELD]
                                            [--location-field LOCATION_FIELD]
                                            [--geolocation-rules TSV]
                                            [--case-sensitive]
                                            [--no-default-rules]

Named Arguments

--region-field

Field that contains regions in NDJSON records.

Default: 'region'

--country-field

Field that contains countries in NDJSON records.

Default: 'country'

--division-field

Field that contains divisions in NDJSON records.

Default: 'division'

--location-field

Field that contains location in NDJSON records.

Default: 'location'

--geolocation-rules

TSV file of geolocation rules with the format: ‘<raw_geolocation><tab><annotated_geolocation>’ where the raw and annotated geolocations are formatted as ‘<region>/<country>/<division>/<location>’. If creating a general rule, then the raw field value can be substituted with ‘*’.Lines starting with ‘#’ will be ignored as comments.Trailing ‘#’ will be ignored as comments. Note that the raw geolocation matching is case-insensitive unless the --case-sensitive flag is provided. The rules defined in the provided file will have precedence over the default rules in <https://github.com/nextstrain/augur/blob/33.4.1/augur/data/geolocation_rules.tsv>.

--case-sensitive

Use case-sensitive matching of raw geolocation fields to geolocation rules.

Default: False

--no-default-rules

Do not use Augur’s default geolocation rules.

Default: False

INPUTS

Input options shared by all augur curate commands. If no input options are provided, commands will try to read NDJSON records from stdin.

--metadata

Input metadata file. May be plain text (TSV, CSV) or an Excel or OpenOffice spreadsheet workbook file. When an Excel or OpenOffice workbook, only the first visible worksheet will be read and initial empty rows/columns will be ignored. Accepts ‘-’ to read plain text from stdin.

--id-column

Name of the metadata column that contains the record identifier for reporting duplicate records. Uses the first column of the metadata file if not provided. Ignored if also providing a FASTA file input.

--metadata-delimiters

Delimiters to accept when reading a plain text metadata file. Only one delimiter will be inferred.

Default: (',', '\t')

--fasta

Plain or gzipped FASTA file. Headers can only contain the sequence id used to match a metadata record. Note that an index file will be generated for the FASTA file as <filename>.fasta.fxi

--seq-id-column

Name of metadata column that contains the sequence id to match sequences in the FASTA file.

--seq-field

The name to use for the sequence field when joining sequences from a FASTA file.

--unmatched-reporting

Possible choices: error_first, error_all, warn, silent

How unmatched records from combined metadata/FASTA input should be reported.

Default: error_first

--duplicate-reporting

Possible choices: error_first, error_all, warn, silent

How should duplicate records be reported.

Default: error_first

OUTPUTS

Output options shared by all augur curate commands. If no output options are provided, commands will output NDJSON records to stdout.

--output-metadata: Output metadata TSV file. Accepts ‘-’ to output TSV to stdout.
--output-fasta: Output FASTA file.
--output-id-field: The record field to use as the sequence identifier in the FASTA output.
--output-seq-field: The record field that contains the sequence for the FASTA output. This field will be deleted from the metadata output.