augur.io.json module

This file, augur/io/json.py, started as a copy of lib/id3c/json.py and the functions shorten_left(), contextualize_char(), and mark_char() from lib/id3c/utils.py in the https://github.com/seattleflu/id3c repo as of commit 911e7d7bfccc4d050e63e6f73d7a7e59fa1a80e8, licensed under the MIT License.

Subsequent modifications (as recorded in Augur’s own version control history) are licensed under the same terms as the rest of Augur, the GNU AGPL 3.0.

The LICENSE file included in ID3C’s repo is copied below verbatim:

MIT License

Copyright (c) 2018 Brotman Baty Institute

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

class augur.io.json.AugurJSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

Bases: JSONEncoder

A custom JSONEncoder subclass to serialize data types used for various data stored in dictionary format.

default(obj)

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return super().default(o)

class augur.io.json.BytesWrittenCounterIO

Bases: RawIOBase

Binary stream to count the number of bytes sent via write().

write(b)

written: Number of bytes written.

exception augur.io.json.JSONDecodeError(exc)

Bases: JSONDecodeError

Subclass of json.JSONDecodeError which contextualizes the stringified error message by including a snippet of the JSON source input. Typically you won’t need to ever reference this class directly. It will be raised by load_json() and be caught by except blocks which catch the standard json.JSONDecodeError.

Examples

>>> load_json('{foo: "bar"}')
Traceback (most recent call last):
    ...
augur.io.json.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1): '{▸▸▸f◂◂◂oo: "bar"}'
>>> load_json('not json')
Traceback (most recent call last):
    ...
augur.io.json.JSONDecodeError: Expecting value: line 1 column 1 (char 0): 'not json'
>>> load_json("[0, 1, 2, 3, 4, 5")
Traceback (most recent call last):
    ...
augur.io.json.JSONDecodeError: Expecting ',' delimiter: line 1 column 18 (char 17): unexpected end of document: '…, 3, 4, 5'
>>> load_json("[\n")
Traceback (most recent call last):
    ...
augur.io.json.JSONDecodeError: Expecting value: line 2 column 1 (char 2): unexpected end of document: '[\n'
>>> load_json("\n")
Traceback (most recent call last):
    ...
augur.io.json.JSONDecodeError: Expecting value: line 2 column 1 (char 1): unexpected end of document: '\n'
>>> load_json('')
Traceback (most recent call last):
    ...
augur.io.json.JSONDecodeError: Expecting value: line 1 column 1 (char 0): (empty source document)

CONTEXT_LENGTH = 10

class augur.io.json.JsonEncoder(*args, **kwargs)

Bases: JSONEncoder

Encodes Python values into JSON for non-standard objects.

default(value): Returns value as JSON or raises a TypeError. Serializes: * date using isoformat() * datetime using isoformat() * time using isoformat() * timedelta using isodate.duration_isoformat() * UUID using str()

augur.io.json.as_json(value)

Converts value to a JSON string using our custom JsonEncoder.

The custom encoder supports serialization of date objects:

>>> as_json(date(year=2024, month=7, day=17))
'"2024-07-17"'

datetime objects:

>>> as_json(datetime(year=2024, month=7, day=17, hour=11, minute=38))
'"2024-07-17T11:38:00"'

time objects:

>>> as_json(time(hour=11, minute=38))
'"11:38:00"'

timedelta objects:

>>> as_json(timedelta(days=42))
'"P42D"'

and UUID objects:

>>> as_json(UUID(int=147952133113722764103424939352979237618))
'"6f4e8b5a-8500-4928-b7ae-dc098a256af2"'

augur.io.json.contextualize_char(text, idx, context=10)

Marks the idx char in text and snips out a surrounding amount of context.

Avoids making a copy of text before snipping, in case text is very large.

Examples

>>> contextualize_char('hello world', 0, context = 4)
'▸▸▸h◂◂◂ello…'
>>> contextualize_char('hello world', 5, context = 3)
'…llo▸▸▸ ◂◂◂wor…'
>>> contextualize_char('hello world', 5, context = 100)
'hello▸▸▸ ◂◂◂world'
>>> contextualize_char('hello world', 10)
'hello worl▸▸▸d◂◂◂'
>>> contextualize_char('hello world', 2, context = 0)
'…▸▸▸l◂◂◂…'

>>> contextualize_char('hello world', 11)
Traceback (most recent call last):
    ...
IndexError: string index out of range

augur.io.json.dump_ndjson(iterable)

print() iterable as a set of newline-delimited JSON records.

Return type:: None

augur.io.json.json_size(data): Return size in bytes of a Python object in JSON string form.

augur.io.json.load_json(value): Converts value from a JSON string with better error messages. Raises an augur.io.json.JSONDecodeError which provides improved error messaging, compared to json.JSONDecodeError, when stringified.

augur.io.json.load_ndjson(file, ignore_empty_lines=True)

Load newline-delimited JSON records from file. Ignore empty lines in the file by default.

Return type:: Iterable

augur.io.json.mark_char(text, idx)

Prominently marks the idx char in text.

Examples

>>> mark_char('hello world', 0)
'▸▸▸h◂◂◂ello world'
>>> mark_char('hello world', 2)
'he▸▸▸l◂◂◂lo world'
>>> mark_char('hello world', 10)
'hello worl▸▸▸d◂◂◂'

>>> mark_char('hello world', 11)
Traceback (most recent call last):
    ...
IndexError: string index out of range

>>> mark_char('', 0)
Traceback (most recent call last):
    ...
IndexError: string index out of range

augur.io.json.shorten_as_json(value, length, placeholder)

Converts value to JSON with as_json() and then truncates it to a maximum length (if necessary), indicating truncation with the given placeholder.

Return type:: str

>>> shorten_as_json({'hello': 'world', 'x': 42}, 100, '…')
'{"hello": "world", "x": 42}'
>>> shorten_as_json({'hello': 'world', 'x': 42}, 21, '…')
'{"hello": "world", …}'

For readability, the outermost JSON value delimiter at the right hand side, i.e. }, ], or ", is preserved, such that placeholder will put placed “inside” value’s JSON representation.

>>> shorten_as_json([1,2,3,'four',5,6], 20, '…')
'[1, 2, 3, "four", …]'
>>> shorten_as_json([1,2,3,'four',5,6], 15, '…')
'[1, 2, 3, "fo…]'

The maximum length must be at least two characters longer than the length of the placeholder.

>>> shorten_as_json({'foo': 'bar'}, 4, '...')
Traceback (most recent call last):
    ...
ValueError: maximum length (4) must be two greater than length of placeholder (3), i.e. at least 5
>>> shorten_as_json({'foo': 'bar'}, 5, '...')
'{...}'

augur.io.json.shorten_left(text, length, placeholder)

Truncate the left end of text to a maximum length (if necessary), indicating truncation with the given placeholder.

The maximum length must be longer than the length of the placeholder.

Behaviour is slightly different than textwrap.shorten() which is intended for shortening sentences and works at the word, not character, level.

Examples

>>> shorten_left("foobar", 6, "...")
'foobar'
>>> shorten_left("foobarbaz", 6, "...")
'...baz'
>>> shorten_left("foobar", 3, "...")
Traceback (most recent call last):
    ...
ValueError: maximum length (3) must be greater than length of placeholder (3)

augur.io.json.write_json(data, file, minify=None, minify_threshold_mb=5, indent=2)

Write data as JSON to the given file, creating parent directories if necessary.

Parameters:

data (dict) -- data to write out to JSON
file -- file path or handle to write to
minify (bool or None, optional) -- Control output minification. True forces minified output, False forces non-minified output, None (default) uses auto-detection based on minify_threshold_mb. A truthy value in the environment variable AUGUR_MINIFY_JSON also forces minified output.
minify_threshold_mb (int or float, optional) -- Threshold in megabytes above which output is automatically minified. Only applies when minify is None.
indent (int or None, optional) -- JSON indentation level when not minifying.

Raises:

OSError --