Chemical Terms Evaluator

Introduction

The Chemical Terms Evaluator is a command line application designed to evaluate mathematical expressions on molecules. These expressions usually have a chemical meaning formulated in ChemAxon's Chemical Terms Language using built-in chemical and general purpose functions. It is also possible to extend this built-in set of calculations by a user-defined configuration.

Apart from evaluating Chemical Terms by the evaluate chemaxon command line tool, this evaluation mechanism is used for chemical calculations in ChemAxon products where computational and/or search conditions come into the picture, such as pharmacophore feature identification (note, that the Pmapper feature definitions use a specific syntax), reaction definitions, database filters and chemical calculations.

The heart of the evaluator mechanism is the JEP Java Expression Parser.

You may want to look at the complete language reference including a description of the expression syntax and some simple examples showing how some well-known chemical rules can be formulated in this language.

Evaluator uses molecule context to set the input molecule, therefore calculations refer to the input molecule by default. The language reference also includes a set of Evaluator examples. A set of working examples is available.

Installation

Download and launch platform specific installer by following the installation instructions.

Usage

The command line tool evaluate evaluates a single expression and prints the result in human readable text format or else outputs the input molecule with the result set as a specified SDF tag.

evaluate [options] [input files/strings]

Options

Options: -h, --help this help message -l, --list-functions list Chemical Terms functions Input Options: -c, --config <filepath> configuration XML file (if omitted then default configuration is applied) -n, --no-input-mol expression should be evaluated without input molecule -e, --expr-string <str|filepath> expression string or file Output Options: -o, --output <filepath> output file path (default: stdout) -g, --ignore-error continue with next molecule on error -v, --verbose verbose output -C, --clean <dim[:opts]> clean output molecules (dim: 2 or 3) with options (default: t2000 - time limit: 2 sec) (see http://www.chemaxon.com/marvin/help/sci/cleanoptions.html) -f, --format <format> output format if result is molecule (default: smiles or smarts) (ignores the output options below) -x, --extract <format> extract mode: write exactly those molecules in the specified format that satisfy the input boolean expression (excludes other output options) -p, --precision <precision> max. number of fractional digits in the output (default: 2) -S, --sdf-output SDF output (otherwise text output) -t, --tag name of the SDFile tag to store the evaluation result (default: CALC) -i, --include-expr output expression string

The input molecule file can contain more than one molecule, in this case the expression evaluation is performed for all input molecules one-by-one.

The command line parameter --config specifies the filename of the configuration file. If this parameter is not specified, then the default configuration is used.

If the command line parameter --no-input-mol is specified then the expression is evaluated without input molecule.

The command line parameter --expr-string specifies the expression string if it is given on the command line or the file path containing the expression string.

The command line parameter --format specifies the output molecule format in case when the output is a molecule or a molecule array. The default format is SMILES / SMARTS. If this option is used then all other output options except for --output, --ignore-error and --verbose are ignored.

If the command line parameter --clean is specified then result molecules as well as SDF output is cleaned in the given dimension.

If the command line parameter --extract is specified then the input expression is used as a molecule filter: for each input molecule it is evaluated as a boolean condition and the program filters the molecules that satisfy this condition, that is, for which the expression evaluation result is true. These molecules are written as output in the specified format. If this option is used then all other output options except for --output, --ignore-errorand --verbose are ignored.

The command line parameter --precision specifies the maximum number of fractional digits to be displayed in the output.

If the command line parameter --sdf-output is specified then input molecules are written to the output in SDF format with evaluation result set as an SDF tag. The command line parameter --tag specifies this SDF tag.

If the command line parameter --include-expr is specified then the evaluation result is preceeded by the expression string itself in the output.

If the command line parameter --ignore-error is specified, then import/export errors will not stop the processing but the error is written to the console and the molecule is skipped. By default, the program exits in case of molecule import/export errors.

The expression string

The expression string is a functional expression that the evaluator tries to evaluate. It can contain calculation functions that are available in the cxcalc command line tool. These calculations have many different calculation parameters. The syntax of the parameters is "key1:value1 key2:value2 ...".

For example, if the user wants to calculate the partial charges of the atoms in a molecule, and wants to specify the type of the charge and the microspecies, he should type

evaluate -e "charge('type:pi pH:6.5')" molecule.mol

In this case the parameter string defines that the user wants to calculate pi charge for each atom of the microspecies at pH=6.5. Other parameters can be specified in the same fashion.

Input

The software may take molecules from a text file. Most molecular file formats are accepted (MDL molfile, Compressed molfile, SDfile, Compressed SDfile, SMILES, etc.).

If no input file name is given in the command line, the standard input is read.

Output

If no output file name is given, results are written to the standard output.

If the --sdf-output command line parameter is specified, the output format is SDF and the evaluation result is written to an SDF tag (default tag: CALC). Otherwise only the evaluation result is written to the output in simple text format.

Configuration

The configuration file is an XML file containing some/all of the following optional subsections:

  1. Evaluator parameters: this section specifies general evaluator parameters, currently cache-mode can be set here

  2. Plugin definitions: this section describes the plugins and their parameters that can be referenced from the expression (these override the default plugin definitions)

  3. Function definitions: this section describes the predefined and user-defined functions that can be referenced from the expression (these override the default function definitions)

  4. Matching conditions: this section specifies the reference ID of the substructure matching function and its search options in case when they are different from the default substructure search settings (these override the default matching condition)

Evaluator Parameters

The evaluator parameter section currently sets the cache-mode attribute: if set to "true" then matching condition and plugin calculation results are cached in the molecule object and reused instead of performing the same structure search or chemical calculation repeatedly. The default is "false", since typically a Chemical Terms evaluation does not contain multiple references to the same matching condition or calculation and the caching procedure by itself also has some overhead.

Example:

<Params Cached="true"/> 

Plugin Definitions

The plugin declarations enables different structure based chemical calculations (e.g. pK a, logP, logD) to be referenced in the expression strings.

Declaration

The plugin definition section contains the following data for each plugin reference that is to be used in the expressions:

  1. the plugin name which the plugin is referenced by in the expression;

  2. the plugin JAR relative to the marvin/plugins directory (marvin refers to Marvin istallation directory), where the plugin class should be loaded from (optional, loaded from the usual CLASSPATH if omitted);

  3. the plugin java class which wraps the plugin calculation into a prescribed frame;

  4. the plugin parameters as parameter name-value pairs - this section is optional: if omitted, the default plugin parameters are used.

The set of possible plugin parameters and a short description for each plugin can be seen with the help of the cxcalc program:

cxcalc <plugin> -h

where plugin is the plugin ID in the cxcalc configuration file. The parameter names used by the Evaluator are the long command line parameter names, without the starting '--' double dashes. For example, take pK a, type:

cxcalc pka -h

which prints out the following help text:

Calculator plugin: pka.
pKa calculation.

Usage:
cxcalc [general options] [input files] pka
[pka options] [input files]

pka options:
-h, --help this help message
-p, --precision <floating point precision as number of
fractional digits: 0-8 or inf> default: 2
-t, --type [pKa|acidic|basic] (default: pKa)
-m, --mode [macro|micro] (default: macro)
-n, --ions max number of ionizable atoms to be considered (default: 8)
-i, --min min basic pKa (default: -10)
-x, --max max acidic pKa (default: 20)
-a, --na number of acidic pKa values displayed (default: 2)
-b, --nb number of basic pKa values displayed (default: 2)

The help, precision, na and nb parameters refer to display options, therefore these are not used by the Evaluator. Thus the parameter set for the pK a calculation in our case is:

type, mode, ions, min, max.

The same plugin can be used with different parameter settings if the XML configuration has more than one <Plugin> section with the same java class but different plugin names used to reference the plugins with each of the different parameter sections. In the following example the pKa1 name references pKa calculation with minimal basic pKa value -3 and maximal acidic pKa value 10 while the pKa2 name references pKa calculation with minimal basic pKa value -20 and maximal acidic pKa value 30. Different functions of a calculator plugin can be referenced by different IDs. In the example below, the "mass" result type of the ElemetalAnalyser plugin is referenced by the mass name, while the "exactmass" result type of the same plugin is referred by the exactmass name.

Example:

<Plugins> <Plugin ID="charge" Class="chemaxon.marvin.calculations.ChargePlugin" JAR="ChargePlugin.jar"/> <Plugin ID="ioncharge" class="chemaxon.marvin.calculations.IonChargePlugin"> <Param Name="pH" Value="3.6"/> <Param Name="max-ions" Value="6"/> <Param Name="min-percent" Value="5"/> <Param Name="charge-type" Value="accumulated"/> </Plugin> <Plugin ID="microspecies" class="chemaxon.marvin.calculations.MajorMicrospeciesPlugin"/> <Plugin ID="pka" class="chemaxon.marvin.calculations.pKaPlugin"/> <Plugin ID="pKa1" class="chemaxon.marvin.calculations.pKaPlugin"> <Param Name="min" Value="-3"/> <Param Name="max" Value="10"/> </Plugin> <Plugin ID="pKa2" class="chemaxon.marvin.calculations.pKaPlugin"> <Param Name="min" Value="-20"/> <Param Name="max" Value="30"/> </Plugin> <Plugin ID="logp" class="chemaxon.marvin.calculations.logPPlugin"> <Param Name="type" Value="logPMicro"/> </Plugin> <Plugin ID="mass" class="chemaxon.marvin.calculations.ElementalAnalyserPlugin"> <Param Name="type" Value="mass"/> </Plugin> <Plugin ID="exactmass" class="chemaxon.marvin.calculations.ElementalAnalyserPlugin"> <Param Name="type" Value="exactmass"/> </Plugin> <Plugin ID="logp" class="chemaxon.marvin.calculations.logPPlugin"/> <Plugin ID="logd" class="chemaxon.marvin.calculations.logDPlugin"/> <Plugin ID="acc" class="chemaxon.marvin.calculations.HBDAPlugin"> <Param Name="type" Value="acc"/> </Plugin> <Plugin ID="don" class="chemaxon.marvin.calculations.HBDAPlugin"> <Param Name="type" Value="don"/> </Plugin> <Plugin ID="acceptorcount" class="chemaxon.marvin.calculations.HBDAPlugin"> <Param Name="type" Value="acceptorcount"/> </Plugin> <Plugin ID="donorcount" class="chemaxon.marvin.calculations.HBDAPlugin"> <Param Name="type" Value="donorcount"/> </Plugin> </Plugins> 

Function Definitions

The expression strings can also include references to predefined functions. These functions are implemented by java classes that have to implement theorg.nfunk.jep.function.PostfixMathCommandI interface. See the JEP API Documentation for details.

Declaration

The function definition section contains the user-defined function implementation java classes accessible from the expressions. Each class is given an ID: this is the name that the function is referenced by from the expression. The Class attribute specifies the java class that implements the function. A predefined function may have preset parameters in a similar fashion as in the Plugin declaration section. Currently only the atomic property query function applies this for presetting the name of the atomic property to be queried.

Example:

    <Functions> <Function ID="array" class="chemaxon.jep.function.IntArray"/> <Function ID="min" Class="chemaxon.jep.function.Min"/> <Function ID="max" class="chemaxon.jep.function.Max"/> <Function ID="count" class="chemaxon.jep.function.Count"/> <Function ID="sum" class="chemaxon.jep.function.Sum"/> <Function ID="sortasc" class="chemaxon.jep.function.SortAsc"/> <Function ID="sortdesc" class="chemaxon.jep.function.SortDesc"/> <Function ID="in" class="chemaxon.jep.function.In"/> <Function ID="eval" class="chemaxon.jep.function.AtomEvaluatorFunction"/> <Function ID="filter" class="chemaxon.jep.function.Filter"/> <Function ID="minatom" class="chemaxon.jep.function.MinAtom"/> <Function ID="maxatom" class="chemaxon.jep.function.MaxAtom"/> <Function ID="minvalue" class="chemaxon.jep.function.MinValue"/> <Function ID="maxvalue" class="chemaxon.jep.function.MaxValue"/> <Function ID="atomprop" class="chemaxon.jep.function.AtomProperties"/> <Function ID="hcount" class="chemaxon.jep.function.AtomProperties"> <Param Name="property" Value="hcount"/> </Function> <Function ID="connections" class="chemaxon.jep.function.AtomProperties"> <Param Name="property" Value="connections"/> </Function> <Function ID="valence" class="chemaxon.jep.function.AtomProperties"> <Param Name="property" Value="valence"/> </Function> <Function ID="atno" class="chemaxon.jep.function.AtomProperties"> <Param Name="property" Value="atno"/> </Function> <Function ID="map" class="chemaxon.jep.function.AtomProperties"> <Param Name="property" Value="map"/> </Function> <Function ID="arom" class="chemaxon.jep.function.AtomProperties"> <Param Name="property" Value="arom"/> </Function> </Functions> 

Matching Conditions

The matching condition declaration enables the Match function to be used in expression strings. This function performs substructure search and optionally checks for atom matching.

Declaration

The declaration gives a reference ID to the function, should contain a Class attribute which specifies the java class that implements the function, and can specify the search attributes in case when they differ from the default settings. Specifying search attributes is optional, if omitted then the default values are used. For a detailed description of the search options see the JChem Query Guide.

Search attributes that can be set in the Search section.

Attribute

Range

Default Value

StereoSearch

true/false

true

DoubleBondStereoMatchingMode

none/marked/all

marked

SubgraphSearch

true/false

true

ExactAtomMatching

true/false

false

ExactStereoMatching

true/false

false

OrderSensitiveSearch

true/false

false

Example:

<Matching ID="match" Class="chemaxon.jep.function.Match"> <Search DoubleBondStereoMatchingMode="all" OrderSensitiveSearch="true"/> </Matching> 

A detailed description of the usage of the match function in expression strings is given below.

Default function definitions, plugin definitions and matching conditions

Default plugin and function definitions as well as the default matching condition are read from the built-in evaluator.xml file located under the chemaxon/jep directory in marvinbeans.jar / jchem.jar provided by ChemAxon. Plugins, functions and matching conditions defined by the user are read from marvin/config/evaluator.xml file (where marvin is the Marvin istallation directory; in case of JChem it is the JChem installation directory) and from MARVIN_MAJOR_VERSION/evaluator.xml file (where MARVIN_MAJOR_VERSION is the major version of Marvin/JChem, e.g. "5.1") located under the .chemaxon (UNIX / Linux) or chemaxon (Windows) subdirectory in the user's home directory. The user defined XML configuration elements are added to default configuration, if both exist then user defined configuration override the built-in settings.

Usage Examples

  1. Calculates the molecule mass for the molecules in the target.sdf file where the mass calculator plugin is defined in the config.xml configuration file:

    evaluate -c config.xml -e "mass()" target.sdf
  2. Filters molecules with molecule mass at least 200, molecule mass is computed according to the default configuration:

    evaluate -e "mass() >= 200" -x sdf -o heavy.sdf target.sdf
  3. Evaluates the expression in file calc.txt for molecules in target.sdf, uses the default configuration:

    evaluate -e calc.txt target.sdf
  4. The same with SDF output into file with results written to the SDF tag RESULT, preceded by the expression string:

    evaluate -e calc.txt -S -i -t RESULT -o result.sdf target.sdf
  5. The same but the expression string is given in the expr.txt file:

    evaluate -e expr.txt -m query.sdf target.sdf
  6. Calculates partial charges for each atom with precision of 3 fractional digits, uses the charge calculation defined in config.xml:

    evaluate -c config.xml -e "charge()" -p 3 target.sdf
  7. The same with SDF file output with charge values written to the CHARGES SDF tag:

    evaluate -c config.xml -e "charge()" -p 3 -S -t CHARGE -o result.sdf target.sdf
  8. Enumerates atoms 1 and 2 of the Markush structure m.mrv, writes the resulting structures in MRV format:

    evaluate -e "markushEnumerations('1,2')" m.mrv -f mrv
  9. Returns 3 random enumerations of the Markush structure m.mrv, writes the resulting structures in MRV format, aligns scaffold and stores scaffold/R-group coloring data:

    evaluate -e "randomMarkushEnumerationsDisplay(3)" m.mrv -f mrv

    Note, that the display options (coordinates and attached coloring data) cannot be stored in the default SMILES output format, therefore it is necessary to specify the MRV output format in this case.

  10. Searches for structural issues, given as action strings; evaluator returns found explicit hydrogens or SMARTS defined substructures:

    evaluate -e "check('explicith..substructure:name=nitro check:reactionSmarts=[O:2]=[N:1]=O')" "[H]C1=CC(C)=CC(=C1)N(=O)=O" "[H]C1=CC(C)=CC=C1" "CC1=CC(=CC=C1)N(=O)=O" "CC1=CC=CC=C1"