augur.merge module

Merge two or more metadata tables into one.

Tables must be given unique names to identify them in the output and are merged in the order given.

Rows are joined by id (e.g. “strain” or “name” or other –metadata-id-columns), and ids must be unique within an input table (i.e. tables cannot contain duplicate ids). All rows are output, even if they appear in only a single table (i.e. a full outer join in SQL terms).

Columns are combined by name, either extending the combined table with a new column or overwriting values in an existing column. For columns appearing in more than one table, non-empty values on the right hand side overwrite values on the left hand side. The first table’s id column name is used as the output id column name. Non-id columns in other input tables that would conflict with this output id column name are not allowed and if present will cause an error.

One generated column per input table may be optionally appended to the end of the output table to identify the source of each row’s data. Column names are generated with the template given to –source-columns where “{NAME}” in the template is replaced by the table name given to –metadata. Values in each column are 1 or 0 for present or absent in that input table. By default no source columns are generated. You may make this behaviour explicit with –no-source-columns.

Metadata tables of arbitrary size can be handled, limited only by available disk space. Tables are not required to be entirely loadable into memory. The transient disk space required is approximately the sum of the uncompressed size of the inputs.

SQLite is used behind the scenes to implement the merge, but, at least for now, this should be considered an implementation detail that may change in the future. The SQLite 3 CLI, sqlite3, must be available. If it’s not on PATH (or you want to use a version different from what’s on PATH), set the SQLITE3 environment variable to path of the desired sqlite3 executable.

class augur.merge.NamedMetadata(name, *args, **kwargs)

Bases: Metadata

name: str

User-provided descriptive name for this metadata file.

table_name: str

Generated SQLite table name for this metadata file, based on name.

exception augur.merge.SQLiteError

Bases: Exception

augur.merge.count_unique(xs)
Return type:

Iterable[Tuple[TypeVar(T), int]]

augur.merge.indented_list(xs, prefix)
augur.merge.pairs(xs)

Split an iterable of k=v strings into an iterable of (k,v) tuples.

Return type:

Iterable[Tuple[str, str]]

>>> pairs(["abc=123", "eight nine ten=el em en"])
[('abc', '123'), ('eight nine ten', 'el em en')]

Strings missing a k and/or a v part get an empty string.

>>> pairs(["v", "=v", "k=", "=", ""])
[('', 'v'), ('', 'v'), ('k', ''), ('', ''), ('', '')]

k ends at the first =.

>>> pairs(["abc=123=xyz", "=v=v"])
[('abc', '123=xyz'), ('', 'v=v')]
augur.merge.register_parser(parent_subparsers)
augur.merge.run(args)
augur.merge.shquote_humanized(x)

shquote for humans.

Use C-style escapes supported by shells (specifically, Bash) for characters that humans would typically use C-style escapes for instead of quoted literals.

<https://www.gnu.org/software/bash/manual/bash.html#ANSI_002dC-Quoting>

>>> shquote_humanized("abc")
'abc'
>>> shquote_humanized("\t")
"$'\\t'"
>>> shquote_humanized("abc def")
"'abc def'"
>>> shquote_humanized("abc\tdef")
"abc$'\\t'def"
augur.merge.sqlite3(*args, **kwargs)

Internal helper for invoking sqlite3, the SQLite CLI program.

augur.merge.sqlite3_table_columns(db_path, table)
Return type:

Iterable[str]

augur.merge.sqlite_quote_dot(x)

Quote a SQLite CLI dot-command argument.

<https://sqlite.org/cli.html#dot_command_arguments>

augur.merge.sqlite_quote_id(*xs)

Quote a SQLite identifier.

<https://sqlite.org/lang_keywords.html>

>>> sqlite_quote_id('foo bar')
'"foo bar"'
>>> sqlite_quote_id('table name', 'column name')
'"table name"."column name"'
>>> sqlite_quote_id('weird"name')
'"weird""name"'
augur.merge.sqlite_quote_string(x)

Quote a SQLite string (i.e. produce a string literal).

<https://www.sqlite.org/lang_expr.html#literal_values_constants_>