Accessories

With a standard pymzqc installation come a couple of accessories. The accessories currently include the following subprojects:

File-handling tools

Fileinfo

mzqc-fileinfo [OPTIONS] INFILE

The fileinfo tool is a CLI tool built on click. Its purpose is simple as are its’ call options. Given a single mzQC file, it will produce a summary of the file’s contents: which runs and sets are included and with which metrics. For example:

The selected mzQC file has 5 different metrics registered,
from 1 different runs and 0 defined 'sets',
and it was created @ 2020-12-01 11:56:34+00:00.

mzQC "run" #1 was created for the input of the files:
        💾 CPTAC_CompRef_00_iTRAQ_01_2Feb12_Cougar_11-10-09.trfr.t3
                @ ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2014/09/PXD000966/CPTAC_CompRef_00_iTRAQ_01_2Feb12_Cougar_11-10-09.raw/CPTAC_CompRef_00_iTRAQ_01_2Feb12_Cougar_11-10-09.trfr.t3.mzML
                of type mzML format
        🕑 The MS run object was completed at 2012-02-03T11:00:41
        Metrics:
         📈('number of MS2 spectra', 'MS:4000060')
         📈('number of chromatograms', 'MS:4000071')
         📈('m/z acquisition range', 'MS:4000069')
         📈('number of MS1 spectra', 'MS:4000059')
         📈('retention time acquisition range', 'MS:4000070')

Please get more info on usage with the --help option.

Filemerger

mzqc-filemerger [OPTIONS] [MZQC_INPUT]... MZQC_OUTPUT

[🚧🏗🚧]

Tool in beta stage of development.

⚠ Merging operations are limited for beta. In case no clear run or set correspondence can be established, the merge will fall back to a conservative merge into lists of separate runs

[🚧🏗🚧]

The filemerger tool is a CLI tool built on click. Its purpose is to merge one or more mzQC files. The tool accepts multiple files or CLI wildcards as input and takes the last filename as target destination for the merge product.

For example: mzqc-filemerger *.mzqc temp_test.mzqc

Fixdescriptions

mzqc-fixdescriptions [OPTIONS] INFILE OUTFILE

[🚧🏗🚧]

Tool in beta stage of development.

[🚧🏗🚧]

The description fixer tool is a CLI tool built on click. Its purpose is to ‘fix’ the descriptions of all applicable elements in a mzQC files. It loads a mzQC file, loads its CVs (online), and adds descriptions where missing and possible from CV lookup.

Validator

Local validation

mzqc-validator [OPTIONS] INFILE

The validator tool is a CLI tool built on click. It will generate a joint validation of syntax and semantics of a given mzQC input. The output is in json format. The validator will segment the validation report into lists of the following categories:

  • “input files”: reports duplicate input files for sets or runs or inconsistent file name and location

  • “label uniqueness”: checking if run and set labels are unique within the file,

  • “metric use”: reports duplicate metric use within a set or run, and, if applicable, table consistency, unit use, “ontology load errors”: all controlled vocabularies that could not be loaded,

  • “ontology term errors”: checks for ambiguous terms found in multiple of the used controlled vocabularies, terms used not found in any given controlled vocabulary, and correct name, definition, and reference usage,

  • “schema validation”: report all elements not corresponding to the mzQC schema”,

  • “ontology validation”: in case any non-online controlled vocabularies were used.

The tools first reads the INFILE and will produce a first error if the file can’t be read. This can be because the JSON is illformatted or the structure contains elements that cannot be parsed by pythons Json library. In case you encounter such an error, we suggest you use a JSON syntax checker. E.g.: check-jsonschema --schemafile ./schema/mzqc_schema.json INFILE [see the pypi package for the tool]. The validator then goes on to retrieve all controlledVocabularies (CV) listed in INFILE. For successful validation the used CV therefore needs to be accesible via a stable URL and all terms used in the INFILE must be included in the CVs. The validator will produce an error for each unknown term it encounters. Method of lookup is accession. The validator also checks if the name of the term used corresponds to the CV entry. The validator then checks the INFILE contents asper the previously described categories.

Validator API (formerly heroku)

Simple mzqc-validator API

The simple API has three endpoints:

  1. to indicate ‘/status/’ (GET)

  2. providing ‘/documentation/’ (GET)

  3. to post mzQC files to ‘/validator/’ (POST)

The documentation endpoint provides a dict with details to each part of the validation (key) as text (value). The validator endpoint takes a mzQC file (JSON) and responds with an object as described in the documentation endpoint.

Validator build

From the root of the pymzqc source folder (i.e. build context pymzqc/) build the mzqcaccessories/onlinevalidator/Dockerfile, e.g. with podman:

podman build -t mzqc-validator -f mzqcaccessories/onlinevalidator/Dockerfile .

(If you are testing a release without pypi package uncomment the respective lines in the Dockerfile to override the pymzqc version used.)

Pre-built container images for selected (pre-)release versions can be found at the mzqc-validator container registry. If you want to deploy the onlinevalidator with your local pymzqc installation, please be aware of extra dependencies to the online-validator.

Deployment

To test a deployment, run the mzqc-validator flask app in gunicorn from the container (as described in wsgi.py).

podman run -p 5000:5000 -ti localhost/mzqc-validator python3 -m gunicorn wsgi:app -b 0.0.0.0:5000 --chdir mzqc-validator/

For local tests calling the flask app directly (i.e. as single thread app) is fine too: python mzqcaccessories/onlinevalidator/mzqc_online_validator.py; note that the ports might differ, depending on the flask and system defaults. Calling the mzqc_online_validator directly in gunicorn is fine too (podman run -p 8123:8123 -ti localhost/mzqc-validator python3 -m gunicorn mzqc_online_validator:app -b 0.0.0.0:8123 --chdir mzqc-validator/), the wsgi.py indirection is a legacy effect from heroku’s Procfile use and their example app.

The validate function of SemanticCheck is considerate of the environment variable MAX_ERR which set to an integer limits the amount of validation errors that can occur before validation is aborted. This can be for example adjusted in the call like so: podman run --env 'MAX_ERR=5' -p 5000:5000 -ti localhost/mzqc-validator python3 -m gunicorn wsgi:app -b 0.0.0.0:5000 --chdir mzqc-validator/

A Docker compose deploment example can be found at mzqcaccessories/onlinevalidator/compose.yaml.

Port Mapping

The mzQC gitHub-pages integration and local_validator.html expect the API to run on port 5000.

Local Customisation, Development, Testing

First use dev-test-validation.py as testbed for new changes.

Then, you can build and deploy the container as described above and access the API e.g. with mzqcaccessories/onlinevalidator/local_validator.html You can find both files necessary within the repository under mzqcaccessories/onlinevalidator.

Also, for either testing of a local deployment or as convenience for local validation, the repository provides a html page to call a locally deployed mzqc-validator API mzqcaccessories/onlinevalidator/local_validator.html

Legacy Heroku Deployment

Or you can deploy your own heroku dyno like so:

cd /tmp/
curl https://cli-assets.heroku.com/install-ubuntu.sh | sh
heroku login
heroku git:clone -a mzqc-validator
cd mzqc-validator
rsync -aP --delete /home/walzer/psi/pymzqc/acessories/onlinevalidator  /tmp/mzqc-validator
git push heroku master