augur mergeļƒ

Merge two or more metadata tables into one.

Tables must be given unique names to identify them in the output and are merged in the order given.

Rows are joined by id (e.g. ā€œstrainā€ or ā€œnameā€ or other ā€“metadata-id-columns), and ids must be unique within an input table (i.e. tables cannot contain duplicate ids). All rows are output, even if they appear in only a single table (i.e. a full outer join in SQL terms).

Columns are combined by name, either extending the combined table with a new column or overwriting values in an existing column. For columns appearing in more than one table, non-empty values on the right hand side overwrite values on the left hand side. The first tableā€™s id column name is used as the output id column name. Non-id columns in other input tables that would conflict with this output id column name are not allowed and if present will cause an error.

One generated column per input table may be optionally appended to the end of the output table to identify the source of each rowā€™s data. Column names are generated with the template given to ā€“source-columns where ā€œ{NAME}ā€ in the template is replaced by the table name given to ā€“metadata. Values in each column are 1 or 0 for present or absent in that input table. By default no source columns are generated. You may make this behaviour explicit with ā€“no-source-columns.

Metadata tables of arbitrary size can be handled, limited only by available disk space. Tables are not required to be entirely loadable into memory. The transient disk space required is approximately the sum of the uncompressed size of the inputs.

SQLite is used behind the scenes to implement the merge, but, at least for now, this should be considered an implementation detail that may change in the future. The SQLite 3 CLI, sqlite3, must be available. If itā€™s not on PATH (or you want to use a version different from whatā€™s on PATH), set the SQLITE3 environment variable to path of the desired sqlite3 executable.

usage: augur merge [-h] --metadata NAME=FILE [NAME=FILE ...]
                   [--metadata-id-columns [TABLE=]COLUMN [[TABLE=]COLUMN ...]]
                   [--metadata-delimiters [TABLE=]CHARACTER [[TABLE=]CHARACTER ...]]
                   --output-metadata FILE [--source-columns TEMPLATE]
                   [--no-source-columns] [--quiet]

inputsļƒ

options related to input

--metadata

Required. Metadata table names and file paths. Names are arbitrary monikers used solely for referring to the associated input file in other arguments and in output column names. Paths must be to seekable files, not unseekable streams. Compressed files are supported.

--metadata-id-columns

Possible metadata column names containing identifiers, considered in the order given. Columns will be considered for all metadata tables by default. Table-specific column names may be given using the same names assigned in ā€“metadata. Only one ID column will be inferred for each table. (default: strain name)

Default: ('strain', 'name')

--metadata-delimiters

Possible field delimiters to use for reading metadata tables, considered in the order given. Delimiters will be considered for all metadata tables by default. Table-specific delimiters may be given using the same names assigned in ā€“metadata. Only one delimiter will be inferred for each table. (default: , $ā€™tā€™)

Default: (',', '\t')

outputsļƒ

options related to output

--output-metadata

Required. Merged metadata as TSV. Compressed files are supported.

--source-columns

Template with which to generate names for the columns (described above) identifying the source of each rowā€™s data. Must contain a literal placeholder, {NAME}, which stands in for the metadata table names assigned in ā€“metadata. (default: disabled)

--no-source-columns

Suppress generated columns (described above) identifying the source of each rowā€™s data. This is the default behaviour, but it may be made explicit or used to override a previous ā€“source-columns.

--quiet

Suppress informational and warning messages normally written to stderr. (default: disabled)

Default: False