Structure Checker Command Line Application

structurechecker command-line

Structure Checker is a chemical validation tool detecting and fixing common structural errors or special features that can be potential sources of problems. structurechecker is the command-line tool of Structure Checker.

Options

General options of structurechecker

  -h,  --help                        this help page
-hc, --help-checker-action help page of valid checker actions
-hf, --help-fixer-action help page of valid fixer actions
-m, --mode <operationmode> [check|fix]
mode of the operation (default: check)
check only check is executed,
does not modify molecules
fix fix molecules containing structure errors
whenever possible
-x fix mode (deprecated, use --mode fix)
Input options:
-c, --config <filepath|string> action string configuration
actions separated by "..",
Output options:
-t, --output-type <output type> [single|separated|accepted|discarded]
set output type(default: single)
single both accepted and discarded structures are
written to the <output path>
separated accepted structures are written to the
<output path>, discarded structures are
written to the <discarded path>
accepted only accepted structures are
written to the <output path>
discarded only discarded structures are
written to the <discarded path>
-o, --output <output path> output file (default: standard output)
-d, --discarded <discarded path> write molecules with structure error to
a separate file (default:standard output)
-f, --format <format> output file format (default: smiles)
-rf, --report-file <filepath> write report to a file
-rp, --report-property <propname> write report to the property of the
output, with the specified property name
-rt, --report-pattern <pattern> generate pattern based report file
-re, --report-format <format> file format of the molecules in report
-l, --log <filepath> write software-error log messages to file

Avaliable checker actions: structurechecker -hc

Valid checker actions (strings) are:
3d detect atoms with 3D coordinates
abbrevgroup detect all abbreviated groups
:expanded=[true|false] detect expanded abbreviated groups
:contracted=[true|false] detect contracted abbreviated groups
:excluded=[...] exclude the following groups during check;
set comma-separated list of group abbreviations,
e.g., "abbrevgroup:excluded=[Ph,COOH,Val]"
absentchiralflag detect absent chiral flag
absolutestereoconfiguration detect molecules in which all asymmetric
centers have absolute stereo configuration
alias detect atoms with alias
aromaticity (deprecated) use aromaticityerror
aromaticityerror detect aromaticity errors with the given
aromatization type (default: general)
:basic basic aromaticity errors
:loose loose aromaticity errors
:general general aromaticity errors
atommap detect atoms with map number
atomqueryproperty detect all or specified atom query properties
:H=[true|false] hydrogen count
:X=[true|false] connection count
:D=[true|false] explicit connection count
:R=[true|false] ring count
:h=[true|false] implicit hydrogen count
:r=[true|false] smallest ring count
:a=[true|false] aromaticity
:s=[true|false] substitution count
:u=[true|false] unsaturation
:rb=[true|false] ring bond count
atomvalue detect atoms with atom value
atropisomer detect atropisomers
attacheddata detect atoms with attached data
bondangle detect unpreferred bond angles in 2d
bondlength detect bonds that are too long or too short
chiralflagerror detect incorrectly set chiral flag
circularrgroup (deprecated) use circularrgroupreference
circularrgroupreference detect circular R-group references
coordsystem detect invalid coordination systems
covalentcounterion detect covalent counterions
crosseddoublebond detect crossed double bonds
empty detect items without atoms
explicith detect all or specified explicit hydrogens
:lonely=[true|false] lonely explicit hydrogens
:mapped=[true|false] mapped explicit hydrogens
:charged=[true|false] charged explicit hydrogens
:isotopic=[true|false] isotopic explicit hydrogens
:radical=[true|false] radical explicit hydrogens
:wedged=[true|false] wedged explicit hydrogens
:hconnected=[true|false] hydrogen connected to hydrogen atom
:polymerendgroup=[true|false] hydrogen connected to a SRU S-group
:sgroup=[true|false] hydrogen which is the only atom in an S-group
:sgroupend=[true|false] hydrogen connected to a Superatom S-group
:valenceerror=[true|false] hydrogen connected to an atom which has
valence error
:bridgehead=[true|false] hydrogen connected to a bridgehead atom
explicitlp detect explicit lone pairs
ezdoublebond detect if a double bond can be cis or trans
isotope detect isotopes
metallocene detect incorrect metallocene representations
missingatommap detect atoms without map numbers
missingrgroup (deprecated) use missingrgroupreference
missingrgroupreference detect missing R-group definitions
moleculecharge detect non-neutral molecules
multicenter detect multicenters
multicomponent detect molecules containing disconnected parts
multiplestereocenter detect molecules with multiple stereocenters
ocr detect drawings that originates from
incorrect optical structure recognition
overlappingAtoms detect atoms that are too close to each other
overlappingBonds detect bonds that are too close to each other
pseudoatom detect pseudo atoms
queryatom detect query atoms
querybond detect query bonds
racemate detect asymmetric tetrahedral atoms without
specific stereo configuration
radical detect radical atoms
rare (deprecated) use rareelement
rareelement detect rare elements
ratom detect specified type of R-atoms
:all=[true|false] all type of R-atoms
:disconnected=[true|false] disconnected type of R-atoms
:generic=[true|false] generic type of R-atoms
:linker=[true|false] linker type of R-atoms
:nested=[true|false] nested type of R-atoms
reactionmap (deprecated) use reactionmaperror
reactionmaperror detect reactions with invalid atom mapping
relativestereo detect multiple stereogenic center groups
rgroupattachmenterror detect all R-group attachment errors
rgroupreferenceerror detect errors in R-group definitions
DEPRECATED checker, please use
"missingrgroup", "unusedrgroup",
"circularrgroup" instead.
:missingratom=[true|false] missing R-atom definition
:missingrgroup=[true|false] missing R-group definition
:selfreference=[true|false] self reference errors in R-group definitions
ringstrainerror detect small rings with trans or cumulative
double bonds, or triple bond
solvent detect common solvents appearing
by a main component
staratom detect star atoms
stereocarebox detect stereo search markers on double bonds
straightdoublebond detect undefined double bond stereo layout
substructure:[smarts] detect the given SMARTS structure
as a substructure in the original molecule
unbalancedreaction detect reactions with orphan atoms
unusedrgroup (deprecated) use unusedrgroupreference
unusedrgroupreference detect unused R-group definitions
valence (deprecated) use valenceerror
valenceerror detect valence errors
valenceproperty detect atoms with all or specified
valence properties
:defaultvalence=[true|false] default valence properties
:nondefaultvalence=[true|false] non-default valence properties
wedge (deprecated) use wedgeerror
wedgeerror detect incorrect wedge bonds
wigglybond detect wiggly bonds on chiral centers
wigglydoublebond detect non_stereo double bonds with wiggly
representation connected to a double bond

Avaliable fixer actions: structurechecker -hf

Valid fixer actions (strings) are:
addchiralflag add chiral flag to the molecule
aliastoatom remove aliases from atoms
aliastocarbon (deprecated) use converttocarbon
aliastogroup convert atoms with aliases to abbreviated groups
if the alias is recognized
clean calculate 2D coordinates
clearabsstereo (deprecated) use removeinvalidchiralflag
contractgroup contract all abbreviated groups
converttoelementalform convert isotopes into elemental atoms
converttocarbon remove alias values from atoms and
convert the atom to a carbon
converttoionicform convert covalent counterions to ionic form
converttometalloceneform convert non-standard metallocene representations
converttosingle (deprecated) use converttosinglebond
converttosinglebond convert faulty bonds to single bonds
converttowigglydoublebond convert non-stereo double bond represented by
crossed double bond to wiggly bond representation
into coordinated multicenter representation
crosseddoublebond convert non-stereo double bond represented by
wiggly bond to crossed double bond representation
crossedtowiggly (deprecated) use converttowigglydoublebond
dearomatize convert aromatic rings into Kekule form
expandgroup expand all abbreviated groups if it is possible
fixmetallocene converts metallocenes to coordinative multicenter layout
fixrgroupattachment add missing attachments points to members
with single location
fixunusedrgroups delete unreferenced R-group definitions
fixvalence correct valence problem by removing hydrogens
or setting charges
mapmolecule add atom maps to each atom of the molecule
mapreaction add atom maps to the reaction
neutralize remove charges from the molecule
partialclean recalculate parts of the atom coordinates for 2D layout
pseudotogroup convert pseudo atoms to abbreviated groups
if pseudo label is a known abbreviated group
rearomatize dearomatize the molecule and aromatize it again
removealias remove alias values from atoms
removeatom remove the problematic atoms from the molecule
removeatommap remove atom map numbers
removeatomqueryproperty remove atom query properties
removeatomvalue remove atom values
removeattacheddata remove data attached to atoms
removebond remove problematic bonds from the molecule
removeexplicith remove explicit hydrogens
removeinvalidchiralflag remove the chiral flag
removeradical convert radicals to non_radical atoms
removestereocarebox remove stereo search markers from double bonds
removevalenceproperty remove valence properties from atoms
removezcoordinate set the z-coordinates of atoms to zero
ungroup ungroup all abbreviated groups
wedgeclean recalculate the orientation of wedge bonds

Usage

structurechecker  -c <config file> -m [mode] [<options>] [input list]

The command line parameter -c or --config is mandatory. This parameter specifies the configuration file path or a simple action string.

structurechecker -c config.xml

or

structurechecker -c "atomqueryproperty"

Parameter -m or --mode specifies the operation mode. The following operation modes are available:

  • check (default): searches for errors;

    structurechecker -c config.xml -m check
  • fix: fixes automatically fixable errors.

    structurechecker -c config.xml -m fix

Note : When a molecule import/export error occurs, the program continues to run. The error is written to the console, and the molecule is discarded from the results (i.e., the resulting output file contains less molecules than the input file).

Note: The syntax of commands can be different under various command line shells (bash, tcsh, zsh, etc.).

Input

structurechecker accepts most molecular file formats as input (Marvin Documents (MRV), MDL molfile, Sdfile, RXNfile, Rdfile, SMILES, etc.). The input can be specified as:

  • input file(s),

  • input string(s), or

  • SMILES (default).

    structurechecker -c config.xml -m check input.mrv

Note : If neither the input file nor the input string is specified, the standard input (console) will be read.

structurechecker -c config.xml -m check "OCC(O)C1OC(=O)C(O)=C1O"

Output

structurechecker's output contains the file(s) of the checked/fixed molecules and optionally a report of the results. The molecules are written to the output file(s). The format of the output file(s) can be specified by the -f or --format option (default format is: "smiles"). The type of output is defined by the -t or --output-type parameter. The possible values of the output type are the following:

  • single (default): all molecules are written to the file defined by the --output parameter. If --output parameter is omitted, the result is written in the standard output (console). (--discarded parameter is ignored in this case.)

  • separated: valid and invalid molecules are written to two different files. The --output parameter defines the output file of molecules with valid structures, and the --discarded parameter defines the output file of molecules with invalid structures (or in fix mode, those which cannot be fixed automatically).

    • If --discarded parameter is omitted, molecules with invalid structures are written to standard output;

    • If --output parameter is omitted, molecules with valid structures are written to standard output;

      Note: The indication of --output or --discarded parameter is mandatory. If none of these parameters are defined, the program stops.

  • accepted: only molecules with valid structures are written to file defined by the --output parameter. If --output parameter is omitted, molecules with valid structures are written to the standard output. (--discarded parameter is ignored in this case)

  • discarded: only molecules with invalid structures are written to the file defined by the --discarded parameter. If ?-discarded parameter is omitted, molecules with valid structures are written to the standard output. (--output parameter is ignored in this case.)

The report of structure checking can be written either to a separate file, defined by the --report-file parameter, or to the output file(s) as additional molecule property. The name of the property can be defined by the --report-property parameter.

Note: Not all molecules with structure errors are discarded. When fix mode is selected, molecules with automatically unfixable errors will be discarded only.

Usage examples

Below you can find the short descriptions of some examples.
If you want to check, fix, or filter structures in evaluate or JChem Cartridge, find examples here.

  1. structurechecker -c "metallocene"

    Executes a check with configuration metallocene on the molecule(s) defined in the standard input, and writes the result to the standard output (console);

  2. structurechecker -c "bondLength" in.sdf

    Executes a check with configuration bondLength on the molecule(s) defined in the in.sdf file, and writes the result to the standard output (console);

  3. structurechecker -c "isotope->converttoelementalform" in.sdf Executes a check with configuration isotope->converttoelementalform on the molecule(s) defined in the in.sdf file, and writes the result to the standard output (console);

  4. structurechecker -c "aromaticity..valence" -m fix -f sdf -o out.sdf in.sdf

    Executes a fix with configuration aromaticity and valence on the molecule(s) defined in the in.sdf file, and writes the molecules with valid structures (including automatically fixed molecules) in sdf format to the out.sdf output file;

  5. structurechecker -c config.xml -t separated -o out.sdf -d discarded.sdf

    Executes a check with configuration contained by the config.xml, and writes the molecules with valid structures to out.sdf, and writes the molecules with invalid structures to discarded.sdf.

    Note: The format of both outputs is SMILES(!) as --format (-f) is not defined;

  6. structurechecker -c config.xml -m fix -t separated -d discarded.sdf

    Executes a fix with configuration contained by the config.xml, and writes the molecules with invalid structures todiscarded.sdf, and writes molecules with valid structures to the standard output (console);

  7. structurechecker -c config.xml -m fix -t discarded in.sdf

    Executes a fix with configuration contained by the config.xml, and writes the molecules with invalid structures todiscarded.sdf, and omits molecules with valid structures.