augur mergeļ
Merge two or more metadata tables into one.
Tables must be given unique names to identify them in the output and are merged in the order given.
Rows are joined by id (e.g. āstrainā or ānameā or other āmetadata-id-columns), and ids must be unique within an input table (i.e. tables cannot contain duplicate ids). All rows are output, even if they appear in only a single table (i.e. a full outer join in SQL terms).
Columns are combined by name, either extending the combined table with a new column or overwriting values in an existing column. For columns appearing in more than one table, non-empty values on the right hand side overwrite values on the left hand side. The first tableās id column name is used as the output id column name. Non-id columns in other input tables that would conflict with this output id column name are not allowed and if present will cause an error.
One generated column per input table may be optionally appended to the end of the output table to identify the source of each rowās data. Column names are generated with the template given to āsource-columns where ā{NAME}ā in the template is replaced by the table name given to āmetadata. Values in each column are 1 or 0 for present or absent in that input table. By default no source columns are generated. You may make this behaviour explicit with āno-source-columns.
Metadata tables of arbitrary size can be handled, limited only by available disk space. Tables are not required to be entirely loadable into memory. The transient disk space required is approximately the sum of the uncompressed size of the inputs.
SQLite is used behind the scenes to implement the merge, but, at least for now, this should be considered an implementation detail that may change in the future. The SQLite 3 CLI, sqlite3, must be available. If itās not on PATH (or you want to use a version different from whatās on PATH), set the SQLITE3 environment variable to path of the desired sqlite3 executable.
usage: augur merge [-h] --metadata NAME=FILE [NAME=FILE ...]
[--metadata-id-columns [TABLE=]COLUMN [[TABLE=]COLUMN ...]]
[--metadata-delimiters [TABLE=]CHARACTER [[TABLE=]CHARACTER ...]]
--output-metadata FILE [--source-columns TEMPLATE]
[--no-source-columns] [--quiet]
inputsļ
options related to input
- --metadata
Required. Metadata table names and file paths. Names are arbitrary monikers used solely for referring to the associated input file in other arguments and in output column names. Paths must be to seekable files, not unseekable streams. Compressed files are supported.
- --metadata-id-columns
Possible metadata column names containing identifiers, considered in the order given. Columns will be considered for all metadata tables by default. Table-specific column names may be given using the same names assigned in āmetadata. Only one ID column will be inferred for each table. (default: strain name)
Default:
('strain', 'name')
- --metadata-delimiters
Possible field delimiters to use for reading metadata tables, considered in the order given. Delimiters will be considered for all metadata tables by default. Table-specific delimiters may be given using the same names assigned in āmetadata. Only one delimiter will be inferred for each table. (default: , $ātā)
Default:
(',', '\t')
outputsļ
options related to output
- --output-metadata
Required. Merged metadata as TSV. Compressed files are supported.
- --source-columns
Template with which to generate names for the columns (described above) identifying the source of each rowās data. Must contain a literal placeholder, {NAME}, which stands in for the metadata table names assigned in āmetadata. (default: disabled)
- --no-source-columns
Suppress generated columns (described above) identifying the source of each rowās data. This is the default behaviour, but it may be made explicit or used to override a previous āsource-columns.
- --quiet
Suppress informational and warning messages normally written to stderr. (default: disabled)
Default:
False