About Chemical Calculations and Predictions

What are Chemical Calculations and Predictions?

Calculations and predictions generate values for properties of a particular chemical structure. A calculation is something that generates a value for that structure (e.g. number or atoms, molecular weight) whereas a prediction generates an estimated value for a property that cannot be precisely determined, except by experimental methods (e.g. logP, pKa, solubility), though this distinction is often somewhat blurred. There are usually multiple ways to generate a prediction (e.g. different computer algorithms and/or different parameters) and different ways will generate different values.

When using predictions it is desirable to test the prediction against actual (experimentally determined) values to establish how "good" the prediction is. Typically no one prediction method is "right" and all others "wrong", but some may be generally better than others, and it is important to select the best algorithm and/or parameters for your particular need. Also bear in mind that the "best" approach may well be different for different chemical series, and that some approaches can be improved by training using particular verified (experimentally determined) values.

Whilst the distinction between the accuracy of calculations and predictions is important when interpreting the results, from Instant JChem's point of view both are just "properties" that can be calculated from a chemical structure. For this reason the term "property" will be used unless there is a need to distinguish between a calculation and a prediction.

ChemAxon's Chemical Terms and Calculator Plugins

ChemAxon's approach to calculations and predictions is to make a wide range of properties available to you, and to provide an extensible architecture that allows new calculations to be added. This architecture is "calculator plugins", which allows chemical properties to be "plugged in" in an extensible manner.

Each calculator plugin provides one or more chemical property. A particular calculator plugin typically contains a related set of properties (e.g. the Protonation plugin includes properties for pKa, Major Microspecies and Isoelectric Point). Most plugins require additional licenses to run, though some basic properties are available without licenses. If you need licenses email [email protected]

The calculator plugin architecture allows new calculations or predictions to be added to JChem. This topic is outside the scope of Instant JChem. See here for a guide.

Chemical properties are available in Instant JChem in 2 different ways:

  1. When editing an individual structure using MarvinSketch. These are available from the Marvin Tool menu. Please consult Marvin documentation
    for details on using the calculator plugins in Marvin.

  2. Accessing chemical properties through Chemical Terms expressions that allow simple or complex chemical property expressions to be defined. This way is probably much more useful in Instant JChem, as you can either add chemical properties to your database as a new field or use an additional filter expression when running queries.

Chemical Terms Expressions

Simple Expressions

A Chemical Terms expression is much like a mathematical formula, in that it contains one or more terms that are evaluated to generate a result. This is best illustrated by means of an example. One of the simplest forms of Chemical Terms expressions is:

atomCount()

which as you might expect returns the number of atoms in the current molecule. Similarly the expression

logP()

specifies the predicted logP of the structure, and the expression

PSA()

specifies the toplogical polar surface area of the structure.

Some expressions can take parameters, such as

PSA('7.4')

which predicts the toplogical polar surface area at pH7.4. To find a full description of the huge list of different chemical terms functions please consult the Chemical Terms Reference Guide.

Combining Functions into Complex Expressions

Multiple Chemical Terms functions can be combined into more complex expressions using arithmetical (+, -, *, /), logical (&&, ||, !) or comparison (<, <=, >, >=, =) operators. Although conceptually simple, this provides a very useful mechanism to build powerful Chemical Terms expressions. This is best illustrated by means of some examples:

Expression

Description

logD("7.4") - logD("3.8")

logD at one pH 3.8 subtracted from logD at pH 7.4

logP() < 5

logP must be less than 5

(mass() <= 500) && (logP() <= 5) && (donorCount() <= 5) && (acceptorCount() <= 10)

Lipinski Rule of 5 filter. The && symbol means AND, so this rule passes when all of the terms evaluate to TRUE

(mass() <= 500) + (logP() <= 5) + (donorCount() <= 5) + (acceptorCount() <= 10) + (rotatableBondCount() <= 10) + (PSA() <= 200) + (fusedAromaticRingCount() <= 5) >= 6

A bioavailability filter. 6 out of the 7 filters must pass for the whole rule to evaluate to TRUE. In the context of the + operator the value or the individual rules is 0 (fail) or 1 (pass), and so this expression is just adding up the 1's to find out how many individual terms have passed, and then checked whether this total is greater than or equal to 6

The important thing to emphasize here is that the comparison operators (<, <=, >, >=, =) can be used to convert a text or numeric value into a BOOLEAN (true or false) value, and that the logical operators (&&, ||) are used for combine boolean operators into composite terms where all (or a defined combination) must match for the outcome to be true.

As you can see, the types of expressions you can build up can be very powerful, and it is easy to build your own expressions or adapt existing expressions to your needs. More complete documentation is available in the Chemical Terms Language Reference
Please consult this if you want to use chemical terms expressions for anything but the simplest of purposes.

Complex Return Types

The expressions we have discussed so far have simple return types. These are numbers, text or boolean. Whilst these are currently the most useful types in Instant JChem, there are many other types of chemical terms expressions that return complex types. A complex type is a value that is not a simple value. Examples are:

Expression

Return type

apKa()

Returns the acidic pKa values of all protonisable atoms. The return type is an array, in descending order (strongest pKa first).

majorMs("7.4")

Returns a structure for each of the major microspecies at pH 7.4. The return type is one or more Molecules

tautomers()

Returns a molecule for each of the tautomers of the structure. The return type is one or more Molecules.

Whilst these types of chemical terms expressions are not currently directly usable in Instant JChem (largely because there is no sensible way to display the results within the restrictions of the Grid View ) they are worth mentioning for 2 reasons:

  1. They can be "coerced" into simple values by use of basic functions (e.g. min(), max(), sum(), count()). For instance you could display the number of tautomers using

    count(tautomers())

    .

  2. These Chemical Terms types will be increasingly useful with the Form view that was introduced in Instant JChem 2.0. For instance, you will be able to display all the tautomers for the current structure as a component on the form.

Using Chemical Terms Expressions in Instant JChem

Adding a Chemical Terms Field

The most direct way to use chemical terms expressions is by adding a Chemical Terms field to a JChem Entity. This is just like adding a standard field, but instead of manually entering the values they are automatically generated for each structure using the specified chemical terms expression. For example, to add values for the predicted logP value you would add a Chemical Terms field and specify the expression logP() for the chemical terms expression. Once the field is added, all the logP values for the structures in the table will be generated (this may take some time - minutes or even hours, depending on how many structures the database table contains). Once complete, you not only have logP values for all the structures, but they will also be automatically updated if the structures are edited or any new structures added. The values are therefore "live" and will remain accurate after you make any changes. Also, because these values are automatically generated based on the structure, you cannot edit them directly.

There is a little bit of magic going on as calculations are performed. The current molecule is set as an "invisible" parameter to the calculation, but whilst this is completely transparent to you as the user of a chemical terms expression, it may be useful to point this out as you can use this molecule as part of the input parameters to the calculation. For example, you can use the atom numbers of the input structure as part of the chemical terms expression. For a full description of this advanced functionality see the Chemical Terms Language Reference.

Any Chemical Terms column can be used as part of a query or sort directive in Instant JChem. For example, once you add a Chemical Terms field to your JChem table you can use those values as part of a query or use those values to sort the data, just as if it were a normal field.

Using a Chemical Terms Expression as an Additional Filter to a Query.

Sometimes you may just have a particular question you want to ask of your structures. You just want the see the structures that match your particular requirements and don't want to go through the process of adding all the Chemical Terms fields to generate the values that you need. This is where a Chemical Terms filter becomes useful.

A Chemical Terms filter lets you apply the results of a Chemical Terms expression to a query. This filter gets applied as the last step of running the query, removing structures from the results that don't pass the Chemical Terms filter. The Chemical Terms filter gets applied to EVERY structure that comes out of the execution of the query. This can be very useful, but it also runs the risk of specifying a query that can take an extremely long time to complete.

So a Chemical Terms filter can be very useful, but needs to be applied with some thought. If used as a way of refining a well defined query then it should work very well, but if you specify a complex Chemical Terms filter without a sensible query to pre-filter the results, your query will take a long time to complete. For instance, if you don't apply any other query criteria and try to search a database containing 1 million structures with a complex filter, for example a Lipinski rule of 5, then you should go and make yourself a coffee whilst the query executes (fortunately the query can be cancelled if you get fed up waiting once your coffee has gone cold!).

Which is Better? Chemical Terms Field or Chemical Terms Filter?

While this choice generally depends on your needs, a Chemical Terms field is usually better because the values only need to be generated once. However, that doesn't mean that Chemical Terms filters don't have their place. If you do decide to use a Chemical Terms filter, make sure you spend a moment considering whether you are proceeding in an optimal manner. Chemical Terms filters can be a useful way of refining queries, but if you specify a filter badly you can convert a fast query into something that takes several minutes (or hours!) to complete.

Conclusion

Chemical Terms provides a very powerful extension to Instant JChem. We hope this summary has whet your appetite to the incredible power of Chemical Terms. But please consult the full documentation Chemical Terms Language Reference if you think Chemical Terms will be useful to you. This overview just skims the surface of what Chemical Terms can do for you!