Maximum Common Substructure (MCS) search

This manual gives you a walk-through on ChemAxon's Maximum Common Substructure search:

Introduction

What is the MCS for ?

Finding the Maximum Common Substructure (MCS) of two molecules is a problem with many applications in the field of chemoinformatics. It can be used for similarity search, hierarchical clustering, molecule alignment, and automated reaction mapping among others. However, the imposed time constraints and the complexity of the problem make finding an exact optimal solution infeasible in most use cases. For this reason, ChemAxon provides powerful heuristics algorithms for finding a large approximate MCS of two molecules in a short time.

images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_example_query.png images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_example_target.png
Fig. 1. Maximum Common Substructure (MCS) of two molecules

Concepts

Finding the MCS of two molecular structures is a well-studied area with many published algorithms. From a graph theoretical point of view, the MCS of two molecules is defined as the maximum common edge subgraph (MCES) of the two molecule graphs. That is, MCS and MCES mean exactly the same in ChemAxon's terminology.

Even though the roles of the two molecules in MCS search are generally the same, we distinguish a query and a target molecule. The reason for this is that some special query features are only allowed in the query molecule.

Search features

Query features

The following query features are supported in MCS search, but only in the query molecule:

  • generic query atoms (any, halogen, metal, etc.)

  • atoms lists, not lists;

  • generic bonds (any, single or double, etc.)

However, complex features such as stereochemistry, tautomers and Markush structures are not supported.

Example

Query

Target

images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_query_features_q.png
MCS search with query features (query)

images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_query_features_t.png
MCS search with query features (target)

Search options

MCS search can be customized using various search options:

  • considering atom and bond types;

  • considering charges, isotopes, and radicals (see Table 1);

  • connected or disconnected MCS search (see Table 2);

  • setting a minimum size for extra fragments (in case of disconnected MCS);

  • setting whether and how rings can be broken (see Table 3).

Furthermore, two search modes are provided with different speed/accuracy trade-off. Consider to use the FAST mode if you prefer search speed rather than more accurate MCS results.

Examples

Charge matching

Query

Target

False (default)

Formal charges of the atoms need not match.

images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_options_charge_off_q.png
MCS options - charge matching OFF (query)

images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_options_charge_off_t.png
MCS options - charge matching OFF (target)

True

Formal charges of the atoms should match.

images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_options_charge_on_q.png
MCS options - charge matching ON (query)

images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_options_charge_on_t.png
MCS options - charge matching ON (target)

Table 1. Charge matching option

Connected mode

Query

Target

False (default)

The MCS can consist of multiple fragments.

images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_options_disconnected_q.png
MCS options - disconnected (query)

images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_options_disconnected_t.png
MCS options - disconnected (target)

True

The MCS should consist of only one fragment.

images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_options_connected_q.png
MCS options - connected (query)

images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_options_connected_t.png
MCS options - connected (target)

Table 2. Connected mode option

Ring handling mode

Query

Target

IGNORE (default)

Ring/chain property is ignored.

images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_options_ring_handling_ignore_q.png

images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_options_ring_handling_ignore_t.png

MATCH_RING_BONDS

Two bonds match only if both are
in rings or both are in chains.

images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_options_ring_handling_bonds_q.png

images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_options_ring_handling_bonds_t.png

KEEP_RINGS

Rings should not be broken.

images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_options_ring_handling_keep_rings_q.png

images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_options_ring_handling_keep_rings_t.png

Table 3. Ring handling mode option

Note that the latter two options only consider rings within a given size limit. The default maximum size is eight, i.e. rings of nine or more atoms may be broken even if KEEP_RINGS option is used. (However, this limit can also be changed.)

Usage

Command line usage

JChem also provides a simple command line application for MCS search (mainly for evaluating and demonstrating purposes).

The program can be used as

mcs [options]

Options

  -h, --help                print this help message
-v, --verbose verbose mode
-q, --query <string> query structure string or file name
(multiple queries are allowed)
-t, --target <string> target structure string or file name
(multiple targets are allowed)
-o, --output <file> output file path and name
-f, --format CSV output format (only CSV is supported)
-w, --view display the molecules with the MCS highlighted
-g, --grid display common substructures in a grid view
(for multiple queries and/or targets)
-a, --atommaps mark matching atoms with map numbers
(only in -w mode)
-m, --match ( a[tomtype] | b[ondtype] | c[harge] | i[sotopes] |
r[adical] | m[apnumber] )
atom and bond matching mode
A + or - sign after each property indicates if it
should match or not. By default, only atom and
bond types are considered.
-c, --connected search for a connected common substructure
-s, --minsize <bonds> minimum required size of extra fragments
(the default is 1)
-x, --mode ( f[ast] | n[ormal] )
search mode (controls speed and accuracy)
-r, --keeprings small rings should not be broken

Examples

  • Example #1 Search MCS of the given query (-q) and target (-t) molecules.

    Command

    mcs -q "C12CCC(O)C1(C)CCC1C2CCC2=CC(=O)CCC12C" -t "CC(=O)C1CCC2C3CCC4=CC(=O)CCC4(C)C3CCC12C"

    Result
    (console)

      Query:              CC12CCC3C(CCC4=CC(=O)CCC34C)C1CCC2O
    Target: CC(=O)C1CCC2C3CCC4=CC(=O)CCC4(C)C3CCC12C

    MCS: CC12CCCC1C1CCC3=CC(=O)CCC3(C)C1CC2
    Atom count: 20
    Bond count: 23
    Fragment count: 1
    Similarity score: 0.8519

  • Example #2 Search MCS of the given two molecules (-q, -t) and display the results (-w).

    Command

    mcs -w -q "CN(O)C1=CC(=CC=C1)C(O)=O" -t "OC(=O)CC1=CC=C(C=C1)[N+]([O-])=O"

    Result

    images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_application_result_1.png
    MCS search results

  • Example #3 Search connected MCS (-c) of the given two molecules (-q, -t) using charge matching (-m c+) and display the results (-w) including atom mapping numbers (-a).

    Command

    mcs -c -m c+ -w -a -q "CN(O)C1=CC(=CC=C1)C(O)=O" -t "OC(=O)CC1=CC=C(C=C1)[N+]([O-])=O"

    Result

    images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_application_result_2.png
    MCS search results

  • Example #4 Search pairwise MCS of the molecules in the given two input files (-q, -t) and display the results in a grid view (-g).

    Command

    mcs -g -q queries.mrv -t targets.smiles

    Result

    images/www.chemaxon.com/jchem/doc/user/MCS_files/mcs_application_grid.png
    MCS search results

API usage

The com.chemaxon.search.mcs package contains classes that provide an API for MCS search. Here is a simple example demonstrating the usage of these classes:

MaxCommonSubstructure mcs = MaxCommonSubstructure.newInstance();
mcs.setMolecules(queryMol, targetMol);
McsSearchResult result = mcs.nextResult();
System.out.println("Atoms in MCS: " + result.getAtomCount());
System.out.println("Bonds in MCS: " + result.getBondCount());
System.out.println("MCS molecule: "
+ MolExporter.exportToFormat(result.getAsMolecule(), "smiles"));

You can specify search options like this:

McsSearchOptions searchOpts = new McsSearchOptions.Builder()
.connectedMode(true).chargeMatching(true).build();
mcs = MaxCommonSubstructure.newInstance(searchOpts);

For more information see the API documentation of MaxCommonSubstructure and McsSearchOptions.

References

  1. John W. Raymond and Peter Willett. Maximum common subgraph isomorphism algorithms for the matching of chemical structures. Journal of Computer-Aided Molecular Design, 2002, 16:521-33.

  2. Andrea Grosso, Marco Locatelli, and Wayne Pullan. Simple ingredients leading to very efficient heuristics for the maximum clique problem. Journal of Heuristics, 2008, 14(6):587-612.

  3. Takeshi Kawabata. Build-up algorithm for atomic correspondence between chemical structures. Journal of Chemical Information and Modeling,2011 51(8):1775-87.