Standardizer Developer's Guide
Introduction
The purpose of this guide is to provide developers the necessary information about how to use Standardizer via Java API, and how to extend Standardizer with custom standardizer actions.
Definitions
Standardizer:
ChemAxon's tool to transform chemical structures into customized, canonical representations;
Standardizer Action:
an operation that modifies the structure (e.g., "convert to aromatic form");
Standardizer Configuration:
contains the list of standardizer actions to be executed on the molecule in order to obtain the canonical representation.
Executing Standardizer
To create a Standardizer instance, we need a standardizer configuration. The configuration can be provided as a string, as a file, or as a StandardizerConfiguration instance.
When a standardization is executed Standardizer performs the modifications on the provided molecule instance (it does not clone the molecule). The result of the standardization is the standardized molecule, and the list of changes applied on the molecule. The list of changes helps us to identify which actions in the configuration were executed on the molecule, and which atoms were added / removed / modified during the standardization procedure.
The standardize() method throws IllegalArgumentException if the provided molecule or the configuration is invalid.
StandardizerRunner class contains static helper methods to execute Standardizer in a concurrent environment. It has separate methods for
-
returning only the standardized molecules, or
-
returning the standardizer results containing the standardized molecule, the changes, and the errors occurred during the process.
Examples
Basic example
Molecule molecule = MolImporter.importMol(
"C1=CC=CC=C1"
);
Standardizer standardizer =
new
Standardizer(
"aromatize"
);
standardizer.standardize(molecule);
Storing information about standardization, clean changing atoms in 2D
In this example, information about performed standardizer actions is saved in molecule property fields, and final clean is used to arrange the position of changed atoms.
// create Standardizer based on a XML configuration file
Standardizer standardizer =
new
Standardizer(
new
File(
"config.xml"
));
try
{
Molecule molecule =
null
;
MolExporter exporter =
new
MolExporter(System.out,
"sdf"
);
MolImporter importer =
new
MolImporter(
"mols.sdf"
);
while
((molecule = importer.read()) !=
null
) {
// standardize molecule
standardizer.standardize(molecule);
// get applied task indexes
int
[] appliedTaskIndexes = standardizer.getAppliedTaskIndexes();
// get applied task IDs
String[] appliedTaskIdentifiers = standardizer.getAppliedTaskIDs();
// store applied task indexes and IDs in molecule properties
StringBuilder indexPropertyValue =
new
StringBuilder();
for
(
int
i =
0
; i < appliedTaskIndexes.length; ++i) {
indexPropertyValue.append(appliedTaskIndexes[i]);
indexPropertyValue.append(
" "
);
}
StringBuilder identifierPropertyValue =
new
StringBuilder();
for
(
int
i =
0
; i < appliedTaskIdentifiers.length; ++i) {
identifierPropertyValue.append(appliedTaskIdentifiers[i]);
identifierPropertyValue.append(
" "
);
}
molecule.setProperty(
"TASK_INDEXES"
, indexPropertyValue.toString());
molecule.setProperty(
"TASK_IDS"
, identifierPropertyValue.toString());
// write output
exporter.write(molecule);
}
importer.close();
exporter.close();
}
catch
(LicenseException e) {
e.printStackTrace();
}
catch
(IOException e) {
e.printStackTrace();
}
Concurrent execution
Standardization runs on four threads. The names of the applied standardizer actions are saved in molecule property field called "APPLIED STANDARDIZATION".
Standardizer standardizer =
new
Standardizer(
"aromatize..stripsalts"
);
MolImporter importer =
new
MolImporter(
"molecules.sdf"
);
Iterator<StandardizerResult> results = StandardizerRunner.resultIterator(importer.iterator(), standardizer.getConfiguration(),
4
,
"APPLIED STANDARDIZATION"
);
while
(results.hasNext()){
StandardizerResult result = results.next();
if
(!result.isExceptionOccured()){
Molecule originalMolecule = result.getOriginal();
Molecule standardizedMolecule = result.getStandardized();
List<StandardizerAction> appliedActions = result.getAppliedActions();
//...
}
else
{
// Error occured
System.err.println(
"Error occured:"
+ result.getOccurredException().getMessage());
}
}
Configuration management via API
Standardizer Actions
All Standardizer actions are inherited from the StandardizerAction interface. Each action can have parameters to alter the behavior of the action (e.g., type of the aromatic conversion for AromatizeAction ).
Standardizer Configuration
A configuration contains a list of standardizer actions to be applied on the molecule. A configuration may contain invalid elements. In order to check the validity of a configuration, we can call the StandardizerConfiguration.isValid() method. If the configuration is not valid, the StandardizerConfiguration.createErrorMessage(..) static method can be used to collect the list of errors in textual representation.
Creating configuration
Creating a configuration and adding standardizer actions to the configuration:
Map<String, String> parameterMap =
new
HashMap<String, String>();
parameterMap.put(AbstractStandardizerAction.ID_KEY,
"Remove hydrogens"
);
parameterMap.put(RemoveExplicitHydrogensAction.PROPERTY_KEY_BRIDGEHEAD,
"true"
);
configuration.addAction(
new
RemoveExplicitHydrogensAction(parameterMap));
configuration.addAction(
new
StripSaltsAction(Collections.<String, String> emptyMap()));
All actions in a configuration can be collected as a list using the StandardizerConfiguration.getActions() method.
Serialization
Serialization of a configuration is possible using instances of StandardizerConfiguration writers. There are two implemented serialization formats for Standardizer configuration:
-
XML: Standard XML format for configuration files
StandardizerXMLWriter writer =
new
StandardizerXMLWriter();
writer.writeConfiguration(configuration,
new
FileOutputStream(
new
File(
"config.xml"
)));
-
ActionString: Single line format, optimal for command line parameter
StandardizerActionStringWriter writer =
new
StandardizerActionStringWriter();
writer.writeConfiguration(configuration, System.out);
Deserialization
Deserialization of a configuration is possible using instances of StandardizerConfiguration readers. In case of failure during deserialization IllegalArgumentException is thrown. There are two implemented deserialization formats for Standardizer configuration:
-
XML:
StandardizerXMLReader reader =
new
StandardizerXMLReader(
new
FileInputStream(
new
File(
"config.xml"
)));
StandardizerConfiguration configuration = reader.getConfiguration();
-
ActionString:
StandardizerActionStringReader reader =
new
StandardizerActionStringReader(
"aromatize..stripsalts"
);
StandardizerConfiguration configuration = reader.getConfiguration();
Implementing custom Standardizer actions
Customizing the standardization process is possible on the level of StandardizerActions. Each custom Standardizer action should be derived from
AbstractStandardizerAction
class.
The
standardize1(Molecule)
method should contain the standardization procedure. The input of this method is the molecule to be standardized. During the standardization process, the added/changed/removed atoms of the molecule should be collected in an instance of a Changes object, that will be returned by the method.
Each standardizer action should contain a StandardizerActionInfo annotation. Contents of the annotation:
-
name (required): the name of the standardizer action
-
helpText: the help text in HTML or in plain text
-
description: the short description of the action for tool tips
-
iconPath: the path of the icon associated to the action
-
actionStringToken (required): the list of XML tags to be associated to, separated with comma character
-
xmlToken (required): the list of action string tokens to be associated to, separated with comma character
-
editorClassName: the full qualified class name of the GUI editor to be associated to the action
The information in the annotation is used by the user interface.
Parameters
A standardizer action might need parameters to operate. Since actions are cloneable, the custom action must override the
clone()
method and the parameters of the clone must be set.
The parameters are passed to the standardizer action in the form of a map containing string keys and values. This way the deserialization of the parameters from an XML or ActionString configuration can be implemented in the constructor. Furthermore, the serialization of parameters needs the following to work:
-
The X parameter must be a member of the class, and must have a getX() / isX() methods. The existence of setX() method is recommended.
-
The parameter must be annotated with the @Persistence annotation.
As for now, the serialization supports these parameter types: Boolean , Integer , Double , Float , String , Molecule . The @Persistence annotation can have two parameters:
-
alias: the key of the parameter during serialization
-
format: serialization format for molecule objects (default: CXSMARTS). If converting the molecule to the provided format fails because of the restrictions of the format, then it will be exported to the default format. If exporting the molecule to the default format fails too, then it will be exporter to Base64 encoded zipped MRV.
Setting the parameters via GUI
If we want to use the action in Standardizer Configuration Builder, we can provide an editor for the action. The editor is a swing component, and allows to set the parameters of the action. This step should be omitted if the action does not have parameters. The custom editor must implement the ActionEditor interface.
Examples
-
"Set Chiral Flag" custom standardizer action sets "Absolute" stereo flag on molecules containing exactly one marked chiral center. Action does nothing or removes any set chiral flag if it does not meet the previously mentioned rule. Download the example:setChiralFalg.jar.
-
The "Remove Charge" custom standardizer action removes charges from the molecule. It has a parameter that defines whether charge removal should proceed even if the resulted molecule contains valence errors. The action has an editor, which allows setting this option in graphical environment. Download the example: removeCharge.jar.
Integration of the created Standardizer action is described here.