Reaction fingerprint (RF)

Representing and comparing chemical reactions

The need for the comparison of chemical reactions using computational tools is as basic as in the case of chemical structures. Techniques developed for the estimation of chemical (or other type of) similarity of molecules can be adapted to the similarity estimation of chemical reactions.

In order to define reaction similarity some basic concepts are introduced. Simply speaking, chemical reactions transform one or more reactants to one or more products. Traditionally, reactants are drawn on the left side of the reaction arrow, while products are placed to the right of it. Thus reactants are often referred to as the left side of the reaction (and products as theright side of the reaction). One possible approach to characterize the transformation carried out (and thus to charactirize the reaction itself) is to identify the changing atoms and the changing bonds in the reaction with respect to the reactants and the product srtuctures.

We can say that an atom is changing if

  1. one or more of its bond is changed (ie. the bond is different on the left side than on the right side) OR

  2. it is present only on one side of the reaction and it has a non changing atom neighbor.

A bond is changing if it is present only on one side of the reaction. Changing atoms and changing bonds define the reacting center of the reaction. The reacting center is specific to a particular type of reaction. The example below shows how to track the atoms that take part in reactions with atom maps.

images/www.chemaxon.com/jchem/doc/user/RFp_files/center.png

Fig. 1 Atoms and bonds taking part in a chemical reaction are coloured blue. The number near the reacting centers are the atom maps that associate the atoms in the reactant and the product.

Reaction similarity

Structural properties that are present in the reaction offer a natural approach to introduce the concept of reaction similarity. Two reactions can be considered similar if their product side and/or reactant side are similar. With this consideration, reaction similarity is reduced to molecular or structural similarity.

Nevertheless, another type of reaction similarity can be introduced by focusing on the reacting center of the reaction. This transformational similarity is less influenced by the particular reactant and product present in a reaction but it is dominated by the reaction mechanism. Both of these types of reaction similarity are found to be useful in comparing and matching reactions.

The degree of structural similarity is determined by the structural similarity of the reactants (products) present in the two reactions compared. The degree of transformational similarity can be examined at three distinct levels: strict, medium and coarse. Strict scale similarity compares the reactions with the broadest topological environment of the reacting centers. In contrast to this coarse similarity restricts similarity comparison to the bare reacting centers completely ignoring their neighborhoods. Topological distances introduced in the present implementation of transformational similarity are: 2, 1 and 0 according to the coarseness of similarity. These topological distances are interpreted as bond distances from atoms in the reacting center.

images/www.chemaxon.com/jchem/doc/user/RFp_files/coarse.png

Fig. 2 Coarse transformational similarity: changing atoms and bonds are taking into account

images/www.chemaxon.com/jchem/doc/user/RFp_files/medium.png

Fig. 3 Medium-scale transformational similarity: atoms next to any atom at the reacting centers along with their bonds are taken into account.

images/www.chemaxon.com/jchem/doc/user/RFp_files/strict.png

Fig. 4 Strict similarity: atoms in the 2-bond neighborhood of the reacting centers as well as the corresponding bonds are taken into account.

Reaction fingerprints

The success of use of various types of fingerprints applied to the similarity calculations of chemical structures encourages the introduction of an analogous reaction fingerprint. It is apparent, that such fingerprint would support the structural similarity assessment of reactions: the topological chemical fingerprint of the reactants (products) can directly be compared. However, the transformational similarity should also be addressed, nevertheless, at three different scales of coarseness.

Considering that the transformational similarity was defined by topological properties of the reacting centers, topological fingerprints appear to be viable choice for transformational similarity, too. The reacting center is a substructure of the reactant (and the product), thus its topological fingerprint can be constructed and this fingerprint can be used to represent the reacting center. This concept can also be adapted to the extended neighborhood of the reacting center, simply because the broader neighborhood of the reacting center is just another, larger substructure.

Based on the above considerations the structure of the reaction fingerprint is defined as follows:

  1. chemical fingerprint (CFp) of the reactant(s) and agent(s)

  2. CFp of the product(s)

  3. CFp of the reactant side of the reaction center

  4. CFp of the product side of the reaction center

  5. CFp of the reactant side of the reaction center including its 1 bond neighborhood

  6. CFp of the product side of the reaction center including its 1 bond neighborhood

  7. CFp of the reactant side of the reaction center including its 2 bond neighborhood

  8. CFp of the product side of the reaction center including its 2 bond neighborhood

The total length of the reaction fingerprint (in the present implementation) is 2048 bits. The above defined 8 segments of the reaction fingerprint are layed out in the schema below (segment sizes given in number of bit):

512

512

128

128

128

128

256

256

This reaction fingerprint enables both types of reaction similarity calculations, and with the expense of some extra storage space it makes the transformational similarity calculation efficient in all three predefined levels of coarseness.

Reaction similarity metrics

Two types of reaction similarity calculations have been introduced: structural and transformational. Structural distinguishes the reactant and the product sides, while transformational relates to three levels of coarseness. With these considerations five metrics need to be introduced to efficiently estimate the five different cathegories of reation similarity. These metrics are as follows:

  • ReactantTanimoto

  • ProductTanimoto

  • StrictReactionTanimoto

  • MediumReactionTanimoto

  • CoarseReactionTanimoto

All of these metrics are based on the Tanimoto metric, consequently the degree of similarity is between 0 and 1.

ReactantTanimoto considers only the first quarter of the reactoin fingerprint that represents the reactants in the reaction and ignores the rest of the reaction fingerprint. Therefore if estimates the structural similarity of the reactants only.

ProductTanimoto takes the seconds quarter of the fingerprint that is associated with the products.

StrictReactionTanimoto takes the last two segments of the reaction fingerprint that represent the reacting center of both the reactant and the product side of the reaction with the broadest neighborhood and ignores the first 3/4 of the reaction fingerprint.

Similarly, MediumReactionTanimoto applies the Tanimoto metric to the 5th and 6th segments, while CoarseReactionTanimoto takes the 3th and the 4th segments that encodes the reacting center of the reactant and the product side, respectively.