cxtrain command line tool

Introduction

Some property calculations can be enhanced when experimental data are available for molecules that are similar to the target. Such user-specific information can be incorporated into so-called training libraries, which can be generated with the ChemAxon's commandline tool cxtrain. It is a part of JChem and Marvin Beans packages.

The generated training library, stored on the user's computer, is used by the calculator plugins for improving the property prediction.

Usage

Invoking cxtrain

Invoking cxtrain -h gives the following output:

cxtrain <prediction> [options] [input file (training set)]

Prediction:
pka train pKa prediction
logp train logP prediction
prediction train custom prediction
General options:
cxtrain -h, --help this help message
-i, --training-id<training> sets the training ID
-l, --list list available training ID's
-g, --ignore-error continue with next molecule on error
pKa options:
-V, --validation <filepath> validation results file path
logP options:
-t, --tag <tag name> name of the SDFile tag that stores the experimental logP values
-a, --add-built-in-training-set add built-in logP training set
Custom prediction options:
-t, --tag <tag name> name of the SDFile tag that stores the experimental property values

So you can train a plugin by calling cxtrain:

cxtrain <prediction> [options] [input file (training set)] 

where prediction must be chosen from among pka, logP or Custom prediction (used for a custom property).

cxtrain is only able to train the three plugins mentioned. If another plugin name is given as a command line parameter, the following message appears:

Prediction has to be one of the following: pka, logp, prediction.

Input of cxtrain

cxtrain can handle any molecular file format that is supported by ChemAxon. (e.g.: MDL Molfile, SDF)

Placing the training library

The generated training library is stored on your computer, and it can be used via Marvin, Chemical Terms, Instant JChem or cxcalc.

On Windows operating system the training file is placed under $HOME\chemaxon\calculations\training, where $HOME is commonly c:\Users\username.

On UNIX-based operating systems (Unix, Linux, OSX) the training file is placed under $HOME/.chemaxon/calculations/training, where $HOME is tipically /home/useraccount on Linux and /users/useraccount on OSX.

Options

General options

The following general options are available:

  1. Applying the option --training-id (-i), you can set the ID of your training. Afterwards, this ID will refer the given training during the calculation.

  2. The available training ID's can be listed using option --list (-l).

  3. --ignore-error (-g) skips the molecule on error and continues with the next correct one.

Plugin-specific options

The following plugin-specific options are available:

pKa Plugin:

  • --validation <filepath> (-V) creates validation data; the file path of the pKa training validation chart can be defined optionally.

logP plugin:

  • --add-built-in-training-set (-a) merges your data with the data from built-in logP training set.

  • Option --tag (-t) defines the name of the SDFile tag that stores the experimental logP values.

Custom prediction option:

  • Option --tag (-t) defines the name of the SDFile tag that stores the experimental custom defined values.

Examples

Training p K a calculations

Step #1 Creating the training library from a given data file pKa_trainingset.sdf with a training ID mypka:

cxtrain pka -i mypka pKa_trainingset.sdf

Step #2 Using the generated training set in pKa calcutlations with cxcalc:

cxcalc pKa --correctionlibrary mypka "CSC1=NC2=C(N1)C=NC(O)=N2"

The result of the training is:

              id apKa1 apKa2 bpKa1 bpKa2 atoms
              1 11.19 16.01 2.34 -2.59 7,11,9,4

Training log P calculations

Step #1 Creating the training library from the given data file logP_trainingset.sdf (with experimental logP values stored in the SDF tag named LOGP), setting training ID to mylogp and including data from the built-in training set:

cxtrain logp -t LOGP -i mylogp -a logP_trainingset.sdf

Step #2 To apply your generated logP training library in calculations use the parameter --trainingid and combine it with the parameter --method via cxcalc:

cxcalc logp --method user --trainingid mylogp "CC(C)CCO"

The result of the training is:

              id logP 1 1,13

The following command lists available training IDs for logP calculation:

cxtrain logp --list

The following command trains a custom property calculation using the datafile pampa_trainingset.sdf (with the experimental values stored in the SDF tag named PAMPA) and setting training ID to mypampa:

cxtrain prediction -t PAMPA -i mypampa pampa_trainingset.sdf