Standardizer Developer's Guide

Introduction

The purpose of this guide is to provide developers the necessary information about how to use Standardizer via Java API, and how to extend Standardizer with custom standardizer actions.

Definitions

Standardizer: ChemAxon's tool to transform chemical structures into customized, canonical representations;
Standardizer Action: an operation that modifies the structure (e.g., "convert to aromatic form");
Standardizer Configuration: contains the list of standardizer actions to be executed on the molecule in order to obtain the canonical representation.

Executing Standardizer

To create a Standardizer instance, we need a standardizer configuration. The configuration can be provided as a string, as a file, or as a StandardizerConfiguration instance.

When a standardization is executed Standardizer performs the modifications on the provided molecule instance (it does not clone the molecule). The result of the standardization is the standardized molecule, and the list of changes applied on the molecule. The list of changes helps us to identify which actions in the configuration were executed on the molecule, and which atoms were added / removed / modified during the standardization procedure.

The standardize() method throws IllegalArgumentException if the provided molecule or the configuration is invalid.

StandardizerRunner class contains static helper methods to execute Standardizer in a concurrent environment. It has separate methods for

  • returning only the standardized molecules, or

  • returning the standardizer results containing the standardized molecule, the changes, and the errors occurred during the process.

Examples

Basic example

Molecule molecule = MolImporter.importMol("C1=CC=CC=C1");
Standardizer standardizer = new Standardizer("aromatize");
standardizer.standardize(molecule);

Storing information about standardization, clean changing atoms in 2D

In this example, information about performed standardizer actions is saved in molecule property fields, and final clean is used to arrange the position of changed atoms.

// create Standardizer based on a XML configuration file
Standardizer standardizer = new Standardizer(new File("config.xml"));
try {
Molecule molecule = null;
MolExporter exporter = new MolExporter(System.out, "sdf");
MolImporter importer = new MolImporter("mols.sdf");
while ((molecule = importer.read()) != null) {
 
// standardize molecule
standardizer.standardize(molecule);
 
// get applied task indexes
int[] appliedTaskIndexes = standardizer.getAppliedTaskIndexes();
 
// get applied task IDs
String[] appliedTaskIdentifiers = standardizer.getAppliedTaskIDs();
 
// store applied task indexes and IDs in molecule properties
StringBuilder indexPropertyValue = new StringBuilder();
for (int i = 0; i < appliedTaskIndexes.length; ++i) {
indexPropertyValue.append(appliedTaskIndexes[i]);
indexPropertyValue.append(" ");
}
StringBuilder identifierPropertyValue = new StringBuilder();
for (int i = 0; i < appliedTaskIdentifiers.length; ++i) {
identifierPropertyValue.append(appliedTaskIdentifiers[i]);
identifierPropertyValue.append(" ");
}
molecule.setProperty("TASK_INDEXES", indexPropertyValue.toString());
molecule.setProperty("TASK_IDS", identifierPropertyValue.toString());
 
// write output
exporter.write(molecule);
}
importer.close();
exporter.close();
} catch (LicenseException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}

Concurrent execution

Standardization runs on four threads. The names of the applied standardizer actions are saved in molecule property field called "APPLIED STANDARDIZATION".

Standardizer standardizer = new Standardizer("aromatize..stripsalts");
MolImporter importer = new MolImporter("molecules.sdf");
Iterator<StandardizerResult> results = StandardizerRunner.resultIterator(importer.iterator(), standardizer.getConfiguration(),
4, "APPLIED STANDARDIZATION");
while(results.hasNext()){
StandardizerResult result = results.next();
if(!result.isExceptionOccured()){
Molecule originalMolecule = result.getOriginal();
Molecule standardizedMolecule = result.getStandardized();
List<StandardizerAction> appliedActions = result.getAppliedActions();
//...
} else {
// Error occured
System.err.println("Error occured:" + result.getOccurredException().getMessage());
}
}

Configuration management via API

Standardizer Actions

All Standardizer actions are inherited from the StandardizerAction interface. Each action can have parameters to alter the behavior of the action (e.g., type of the aromatic conversion for AromatizeAction ).

Standardizer Configuration

A configuration contains a list of standardizer actions to be applied on the molecule. A configuration may contain invalid elements. In order to check the validity of a configuration, we can call the StandardizerConfiguration.isValid() method. If the configuration is not valid, the StandardizerConfiguration.createErrorMessage(..) static method can be used to collect the list of errors in textual representation.

Creating configuration

Creating a configuration and adding standardizer actions to the configuration:

Map<String, String> parameterMap = new HashMap<String, String>();
parameterMap.put(AbstractStandardizerAction.ID_KEY, "Remove hydrogens");
parameterMap.put(RemoveExplicitHydrogensAction.PROPERTY_KEY_BRIDGEHEAD, "true");
configuration.addAction(new RemoveExplicitHydrogensAction(parameterMap));
configuration.addAction(new StripSaltsAction(Collections.<String, String> emptyMap()));

All actions in a configuration can be collected as a list using the StandardizerConfiguration.getActions() method.

Serialization

Serialization of a configuration is possible using instances of StandardizerConfiguration writers. There are two implemented serialization formats for Standardizer configuration:

  1. XML: Standard XML format for configuration files

      StandardizerXMLWriter writer = new StandardizerXMLWriter();
    writer.writeConfiguration(configuration, new FileOutputStream(new File("config.xml")));
  2. ActionString: Single line format, optimal for command line parameter

      StandardizerActionStringWriter writer = new StandardizerActionStringWriter();
    writer.writeConfiguration(configuration, System.out);

Deserialization

Deserialization of a configuration is possible using instances of StandardizerConfiguration readers. In case of failure during deserialization IllegalArgumentException is thrown. There are two implemented deserialization formats for Standardizer configuration:

  1. XML:

      StandardizerXMLReader reader = new StandardizerXMLReader(new FileInputStream(new File("config.xml")));
    StandardizerConfiguration configuration = reader.getConfiguration();
  2. ActionString:

      StandardizerActionStringReader reader = new StandardizerActionStringReader("aromatize..stripsalts");
    StandardizerConfiguration configuration = reader.getConfiguration();

Implementing custom Standardizer actions

Customizing the standardization process is possible on the level of StandardizerActions. Each custom Standardizer action should be derived from AbstractStandardizerAction class.
The standardize1(Molecule) method should contain the standardization procedure. The input of this method is the molecule to be standardized. During the standardization process, the added/changed/removed atoms of the molecule should be collected in an instance of a Changes object, that will be returned by the method.

Each standardizer action should contain a StandardizerActionInfo annotation. Contents of the annotation:

  • name (required): the name of the standardizer action

  • helpText: the help text in HTML or in plain text

  • description: the short description of the action for tool tips

  • iconPath: the path of the icon associated to the action

  • actionStringToken (required): the list of XML tags to be associated to, separated with comma character

  • xmlToken (required): the list of action string tokens to be associated to, separated with comma character

  • editorClassName: the full qualified class name of the GUI editor to be associated to the action

The information in the annotation is used by the user interface.

Parameters

A standardizer action might need parameters to operate. Since actions are cloneable, the custom action must override the clone() method and the parameters of the clone must be set.
The parameters are passed to the standardizer action in the form of a map containing string keys and values. This way the deserialization of the parameters from an XML or ActionString configuration can be implemented in the constructor. Furthermore, the serialization of parameters needs the following to work:

  1. The X parameter must be a member of the class, and must have a getX() / isX() methods. The existence of setX() method is recommended.

  2. The parameter must be annotated with the @Persistence annotation.

As for now, the serialization supports these parameter types: Boolean , Integer , Double , Float , String , Molecule . The @Persistence annotation can have two parameters:

  • alias: the key of the parameter during serialization

  • format: serialization format for molecule objects (default: CXSMARTS). If converting the molecule to the provided format fails because of the restrictions of the format, then it will be exported to the default format. If exporting the molecule to the default format fails too, then it will be exporter to Base64 encoded zipped MRV.

Setting the parameters via GUI

If we want to use the action in Standardizer Configuration Builder, we can provide an editor for the action. The editor is a swing component, and allows to set the parameters of the action. This step should be omitted if the action does not have parameters. The custom editor must implement the ActionEditor interface.

Examples

  1. "Set Chiral Flag" custom standardizer action sets "Absolute" stereo flag on molecules containing exactly one marked chiral center. Action does nothing or removes any set chiral flag if it does not meet the previously mentioned rule. Download the example:setChiralFalg.jar.

  2. The "Remove Charge" custom standardizer action removes charges from the molecule. It has a parameter that defines whether charge removal should proceed even if the resulted molecule contains valence errors. The action has an editor, which allows setting this option in graphical environment. Download the example: removeCharge.jar.

Integration of the created Standardizer action is described here.