Share via Community on GitHub

One of the ways we allow researchers to share their analyses through Nextstrain is via GitHub. This allows dataset JSONs and/or narrative markdown files to be stored in your own GitHub repos and accessed through URLs. This gives you complete control, ownership, and discretion over your data. All that is required for this funcitonality is for files to conform to a specific naming scheme (see below). There is no need to get in touch with the Nextstrain team to allow access to the dataset, but if you would like your dataset featured on the front page or to be listed along with all available SARS-CoV-2 builds then please let us know!

P.S. For help with running your analysis, see the bioinformatics introduction.

Technical details

Given a github organisation <ORG> and repository <REPO>, dataset files should be stored in a folder named auspice. The filename must have the format <REPO>[_<NAME1>[_<NAME2>[...]]].json, where underscore-separated dataset-specific names (e.g. <NAME1>) are optional. Such datasets will be available at<ORG>/<REPO>[/<NAME1>/<NAME2>/...]. Note that dataset names are /-separated in the URL. See the table below for examples.

Git Branches In the above description, files are assumed to reside on the master branch. It is possible to access files on a different branch, <BRANCH> by specifying the branch in the URL via<ORG>/<REPO>@<BRANCH>[/<NAME1>/...]. Note that if the default branch on your repo is main then you must specify this in the URL, e.g.<ORG>/<REPO>@main. See the table below for examples.

Listing of all datasets and narratives If a dataset file exists at auspice/<REPO>.json (i.e. there are no dataset specific names in the filename) then visiting<ORG>/<REPO> will automatically load that dataset. If such a file does not exist (i.e. all the datasets have at least one <NAME> in their filenames) then visiting that URL will list the available datasets and narratives.

Narratives The above naming scheme is the same for narratives, with a few small changes. Files should be located in the narratives folder (not auspice), they should have a .md suffix (not .json) and are accessed through URLS<ORG>/<REPO>[/<NAME1>/...]. See the table below for examples. See the table below for an example.

v1 (deprecated) datasets work the same way, except that there are two JSONs required, auspice/<REPO>[_<NAME1>...]_tree.json and auspice/<REPO>[_<NAME1>...]_meta.json. Note that if there is a unified dataset also available (auspice/<REPO>[_<NAME1>...].json) then this will be preferentially used. See “zika-colombia” in the table below as an example.



(GitHub) Org Repository branch File(s) in repository Nextstrain URL
<ORG> <REPO> master auspice/<REPO>.json<ORG>/<REPO>
<ORG> <REPO> <BRANCH> auspice/<REPO>.json<ORG>/<REPO>@<BRANCH>
blab sars-like-cov master auspice/sars-like-cov.json
emmahodcroft cov master N/A (lists available datasets)
emmahodcroft cov master auspice/cov_229E_spike.json
emmahodcroft cov master auspice/cov_OC43_spike.json
jameshadfield scratch test-branch auspice/scratch_placentalia.json
blab zika-colombia master auspice/zika-colombia_meta.json,


(GitHub) Org Repository branch File(s) in repository Nextstrain URL
<ORG> <REPO> master narratives/<REPO>.json<ORG>/<REPO>
ESR-NZ GenomicsNarrativeSARSCoV2 master narratives/
blab ebola-narrative-ms master narratives/

For more examples please see the Nextstrain front page and the listing of all SARS-CoV-2 builds.