IJC tutorial: Using Standardizer to your advantage

Overview

This tutorial will explain why it is a valuable step to add a standardizer rule set to your structure based entities and how to apply a standardizer within the IJC desktop. Application of a standardizer or set of "Business rules" brings order and uniformity into the representation of the molecules in your structure based entities. They also serve as your internal query reference standard when searching for molecules or indeed extracting them for further processing. As such it is highly important to consider the available options and the effects of applying these standardizer rules in a deterministic order. In particular, the form of the "allowed" queries in relation to the database standardization is considered, when many forms are possible. As such, this tutorial acts as a simple guide to show how to set up an entity with the default standardizer applied and then shows a further example where by a fixed transformation is applied to the Nitro group. We use the standard PubChem molecules in the example. We could choose to use either the 2D or 3D sdf files available and in either case we use the first file in the list. We also use the local derby database for the purposes of demonstration, the Oracle / MySQL approaches are essentially similar using the Instant JChem interface.

Files for this example can be found here: 2D or 3D

Create Project & Schema connection

First create a new project container. Use File -> New Project... menu entry or appropriate icon in the toolbar (shortcut - Ctrl+Shift+N). Create a new project and choose IJC Project (with local database). Next name your project and select Finish.

images/download/attachments/49187145/1_new_proj_local.png

Create a new Data tree in the schema & Entities then import the data

Next we should need to create a data tree with a structures table and there are two approaches to this. You can either right click on the schema node and a menu will appear. Then select New Data tree and structure entoty (table).... Alternatively, you could complete the same result with operations in the schema editor with the same result. Create a structures entity in the entities tab by right clicking New Structure entity (table).... Then at the data tree level, promote it using New Data tree from entity ... Additionally, you can create the entity directly at the data tree level by selecting New Data Tree and structure entity (table)... Using the preferred method, create a data tree with a root node named "PubChem2Da".

images/download/attachments/49187145/2_1_new_datatree.png images/download/attachments/49187145/2_2_new_datatree2.png

Finally, you can import data into each entity. This is completed at the entity level. In the entities tab, right click on the Structures entity "PubChem2Da" using the Import File Into X... and select the SDF file. Select Next (we will accept all suggested fields) then Next again and the import commences, finally select Finish once completed.

images/download/thumbnails/49187145/2_3_import_menu.png images/download/attachments/49187145/2_4_import1.png

Understanding the default Standardization

The standardizer rules are applied for the entity on import but can also be re-applied as a later event. Since we have not yet explicitly defined or applied any standardizer rules yet we will be able to see the effects of the default standardization which is automatically applied. This is often referred to as "Aromatize and remove explicit Hydrogens". It is possible to apply a standardizer in create entity dialog as well as from the schema editor for an existing entity. Later we will complete both these routes with the addition of a Nitro functional group rule and examine the same end result.

Once import is completed we can execute some queries within the entity to assist in understanding the default rules and how they impact on search and display. Open a grid view or create a new default form view with MolPanel and view all records. The first thing the astute user will notice is that the molecule's visible in the display appear to have not been standardized according to the default rules. In fact, the internal table stores both the original and standardized versions of the structures and there is a visual properties, display setting property for the widget.

Right click on the MolPanel widget and select Customise Widget Settings. In the visual properties tab, tick the "Display as Standardized" tick box. You will now see the display looks like the expected default standardization rules. The same change needs to be applied separately to the grid view using the structure column. Below we can see an example record before and after standardization.

images/download/attachments/49187145/3_1_customize_widget_menu.png

images/download/attachments/49187145/3_2_customize_widget_wiz.png

images/download/attachments/49187145/3_3_customize_widget_std.png

It is useful now to understand how query might work with the default standardizer applied and irrespective of the display. First, lets search for the SS pyridine using both aromatized (c1ccncc1) and dearomatized (C1=CC=NC=C1) forms of the structure. The Kekule form of the SS query yields 1134 hits (total for first PubChem file is 23408 records). Next convert to aromatized and complete the same search (Convert to Aromatic form). The same hits set is found so this query is synonymous and interchangeable.

images/download/attachments/49187145/4_1_query_aro.png

Next, we try a primary ChloroAlkane ([H]C([H])Cl) which contains two explicit H defined in the query definition - 524 hits are visible. This shows us that even though not displayed, the explicit H are used in the search if defined in the query.
images/download/attachments/49187145/4_2_query_chloro.png
Next we search for the infamous Nitro group using the popular charged form ([O-]N=O) and find 1043 hits. Finally we search for the Nitro group using the debated neutral form (O=N=O) and find no hits returned.

images/download/attachments/49187145/4_3_query_nitro1.png

images/download/thumbnails/49187145/4_4_query_nitro2_nohits.png

Since some organizations, wish to display and search using this form of Nitro group, then it is possible to configure standardizer rules as such so that the hits will be found synonymously with the charged form. The "Pentavalent Nitrogen" conundrum is discussed in some detail in these references but we leave it to each organisation's Scientific apparatus, to decide on there own "Business rules":

  • Journal of Molecular Structure, 300 (1993) 245-256. "On the â??pentavalentâ?? nitrogen atom and nitrogen pentacoordination": Richard D. Harcourt

  • J. Phys. Chem. A 2006, 110, 10507-10512. "Characteristics of Multiple N,O Bonds": Ian Love

Establish a new standardizer

The default Standardization covers the basic expectations of the user. However careful consideration of the available transformations should be completed before building any real production system. Fortunately that is the hard part, the application of a any standardizer rules is straightforward in IJC.

Apply standardizer on entity creation

Next, we will create a new entity in order to show the application of standardizer rules at this stage. create a data tree with a root node named "PubChem2Db". In the Standardizer tab, create an associated Standardizer by pressing Create Standardizer. Add the "Nitro" action. Select Finish and the entity is created. Import the same SDF file once more and create a form view with a MolPanel.
images/download/attachments/49187145/5_1_create_std.png images/download/attachments/49187145/5_2_std_add_nitro.png
Again we search for the infamous Nitro group using the popular charged form ([O-]N=O) and find 1080 hits. Finally we search for the Nitro group using the debated neutral form (O=N=O) and find 1078 hits returned. Standalone charged nitro group ([O-]N=O) is not standardized causing that 2 hits are not found by the neutral nitro group (O=N=O) substructure query. Now the two forms of the query are synonymous with the "Nitro" rule applied.

images/download/attachments/49187145/5_4_query_nitro_neutral.png

Apply standardizer to existing entity

Finally we apply the same standardization to the existing entity "PubChem2Da". Right click on the schema node and select Edit Schema then select the entity level tab. Select the entity "PubChem2Da" and then the Standardizer tab. Currently you will see that only the defaults are applied. Select the Create standardizer button. Now from the list of possible options on the left find "Nitro" and use the Add button to add it to the standardizer for the entity. Use the arrow buttons to place it at the top. Press the Apply button to regenerate the table with a new standardization applied.

images/download/attachments/49187145/6_1_add_std_existing1.png

images/download/attachments/49187145/6_2_add_std_existing2.png

You should now find that "PubChem2Da" and "PubChem2Db" entities exhibit exactly the same behaviour with respect to either form of the Nitro group. We leave you to enjoy experimenting with the other transformations available. If you are unhappy about a particular transformation, remove it from the standardizer configuration and regenerate via 'apply', the original representation is always retained and hence you can revert to any new applied rules. In the screenshot below you can see a particular record which is displayed as standardized. Note the Nitro group is correctly depicted here according to the rule applied and importantly both forms of the query finds all possible results.
images/download/attachments/49187145/7_comparison.png

Congratulations

Congratulations! You have just applied a simple Standardizer example, by learning :

  • How to create project & schema.

  • How to create data tree (Structures) and import data.

  • Understand the default Standardizer rules.

  • Apply a Nitro standardization for a new entity and understand effect.

  • Apply a Nitro standardization for an existing entity and understand effect.