Triggers and sequences: Derby

The following four sections of this page explain how to complete the following tasks :
1. Administer a table in Derby using Instant JChem.
2. Add a text field and populate it with a character based unique key.
3. Define an approach to derive a BOOLEAN data item "Lipinski rule of 5" for new rows based upon chemical terms, implemented as a "AFTER INSERT" trigger.
4. Define an approach to synchronise columns for which the source data is subsequently updated, implemented as a "AFTER UPDATE" trigger. This operation is required for example after a re-standardization operation is applied on a given JChem table.

It is hoped that these examples can be used as templates for further bespoke index based in Chemical terms or other calculated columns. In the case of Derby the "ij" tool can be used to connect to a schema but this requires some additional set up.

java org.apache.derby.tools.ij

Your local Derby database can be found in your windows home (Obviously for Linux this will be different). Once successful you should see the "ij>" command prompt.

connect 'jdbc:derby:C:\Users\Daniel\Documents\IJCProjects\ijc-project\.config\localdb\db';

Alternatively the IJC database explorer will negate the need for this and connection using this approach for administration tasks is explained here: Using the database explorer

1. Administer Derby table using IJC

First, we need to administer a JChem table. One method to do this is via Instant JChem (IJC) Desktop application. Assuming we have an existing Derby connection / schema established in IJC then we can add a new JChem table easily. Right click on your schema and choose "new structure entity table". At the entity type drop down choose "New structure entity (using JChemBase table)". Assuming we accept the defaults - we should see the window below and the subsequent examples assume a table name of "STRUCTURES"

images/download/attachments/48095411/jchemtablewizard.png

2. Character based Unique key generation

A JChem table already has a build in unique integer key (CD_ID), however some organisations prefer to have a character based identifier since this can aid internal communication of structures. In order to create a character based sequence the integer based sequence is embedded in a trigger and prefixed with an appropriate string in this case "mol-" is used. If this is a "private" database then the Chemist initials (or lab notebook reference) might be suitable or perhaps a corporate identifier string is suitable if working with the central data repository.

An additional text based column is now required to store the character based key and this can be created using IJC. Right click on your table and choose "Edit data tree" Left click on "New standard field" and choose "Text field". You can also edit your display name and column name. In this case the column is named "TARGET_KEY". The Text field maps directly to a column of type VARCHAR in Derby. You also need to retain the "Required" value of FALSE.

images/download/attachments/48095411/textfield.png

Apache Derby RDBMS, now supports sequence objects, which can be used directly in our trigger definitions. First, at the ij prompt create a named sequence as in the code below.

CREATE SEQUENCE StructuresSeq AS BIGINT START WITH 1 INCREMENT BY 1 MAXVALUE 1000000 NO CYCLE;

We now create a trigger on the target table STRUCTURES that uses the sequence to populate the "TARGET_KEY" column.

Only one trigger is required to be defined. In Derby only a single SQL statement can be associated with each trigger but the triggers will fire in the order in which they are created.

CREATE TRIGGER structures_key_instrg AFTER INSERT ON structures
REFERENCING NEW AS newrow FOR EACH ROW MODE DB2SQL UPDATE structures
SET target_key = 'mol-' || CAST (NEXT VALUE FOR StructuresSeq AS CHAR(100)) WHERE target_key IS NULL;

To fire the trigger we should simply insert data into the table and for every row the action will occur. IJC will be completeing the insert (during data import) so you should not need to execute this syntax but your trigger will fire as a result of this action, if you should wish to test

INSERT INTO structures (cd_id) values (1);

3. Derived data item : Lipinski rule of 5 filter (New Inserts)

In order to create derivations from existing data an individual or set of triggers can be used. In this example a single drug like filter column "Lipinski_5" is created and populated for any rows that are added to a JChem table. The IJC boolean type actually maps to a SMALLINT type in Derby. You also need to retain the "Required" value of FALSE.

images/download/attachments/48095411/lipinskirole5.png

A pre-requisite for this field is the existence of the necessary chemical terms columns it is derived from (at compile time). Also it is assumed they will be populated and not NULL and if created prior to a data import they will be populated. Several filter definitions already exist but the use of triggers provides an alternative option to avoid some repeat calculation and of course create new bespoke definitions. The Lipinski definition example here is taken from the following reference but has been slighty modified to include the value in each clause :

Reference : Chemoinformatics (Wiley), Page 607, Gasteiger J.

1. molecular weight <= 500 g / mol
2. a calculated log p <= 5
3. H-bond donors <= 5
4. H-bond acceptors <= 10
5. rotatable bond count <= 10

As such the following Chemical terms fields are selected for addition to the table and are populated on data import.

images/download/attachments/48095411/editor.JPG

Two triggers need to be defined order to implement this field. They should be created in the order below and will also be fired in that order

CREATE TRIGGER structures_lipinski5_instrg1 AFTER INSERT ON structures
REFERENCING NEW AS newrow FOR EACH ROW MODE DB2SQL
UPDATE structures SET lipinksi_5 = 1
WHERE cd_molweight <= 500
AND logp <= 5
AND h_bond_donors <= 5
AND h_bond_acceptors <= 10
AND rotatable_bonds <= 10
AND lipinksi_5 is NULL
CREATE TRIGGER structures_lipinski5_instrg2
AFTER INSERT ON structures REFERENCING NEW AS newrow
FOR EACH ROW MODE DB2SQL UPDATE structures
SET lipinksi_5 = 0 WHERE lipinksi_5 is NULL;

Derived data item: Lipinski rule of 5 filter (After table re-standartization or structure edit)

If the standardization rules for a given table are modified or a chemical structure edited then potentially the chemical terms might change also. Any index based on these columns will have no update mechanism defined. As such, one should also consider an after update trigger to re-calculate derived fields after a re-standardization event. Two triggers are required in order to implement as follows:

CREATE TRIGGER structures_lipinski5_updtrg1 AFTER UPDATE ON structures
REFERENCING NEW AS newrow FOR EACH ROW MODE DB2SQL
UPDATE structures SET lipinksi_5 = 1
WHERE cd_molweight <= 500
AND logp <= 5
AND h_bond_donors <= 5
AND h_bond_acceptors <= 10
AND rotatable_bonds <= 10
AND lipinksi_5 is NULL;
CREATE TRIGGER structures_lipinski5_updtrg2 AFTER UPDATE ON structures REFERENCING NEW AS newrow FOR EACH ROW MODE DB2SQL UPDATE structures SET lipinksi_5 = 0 WHERE lipinksi_5 is NULL;

The end result is a drug like filter and less time screening unneccessary molecules

images/download/attachments/48095411/editor12.JPG