Screen Developer Guide
Virtual Screening API Programming
Version 5.8.0
This guide gives examples of usage of the Application Program Interface of ChemAxon's Virtual Screening toolkit Screen . With the help of these examples experienced programmers can develop their own screening software including the generation of various molecular descriptors, dissimilarity calculations and virtual screening. Besides, users can implement custom molecular descriptors and integrate them into ChemAxon's virtual screening environment.
Contents
-
Basic API usage
-
Generating fingerprint
-
Dissimilarity calculation
-
-
Advanced API usage
-
Custom descriptor implementation
1 Introduction
The Screen package provides tools and components for ligand based virtual screening. Ligands (i.e. small molecules) are transformed to molecular descriptors which are series (vector) of bit, integer or floating point values. Most descriptors are based on the topology of the small molecule though 3D descriptors incorporating one or more conformations of the molecule can also be introduced.
Similarity metrics are applied to descriptors to compare them in similarity calculations. Metrics include Tanimoto and Euclidean and their variants (see ScreenMD user's manual for detailed description)
.
2 Basic API usage
The API of the Screen package provides high level, easy to use interface to all types of molecular descriptors provided by ChemAxon. The interface is uniform, that is, typical API methods do not distinguish between different descriptor type. Thus most examples can easily be modifed for other descriptors.
2.1 Generating fingerprint
The first example demostrates how a topological descriptor, the ChemAxon Chemical Fingerprint can be assigned to each individual molecule read from an SDFile. The result of this program is a descriptor file that contains one molecular descriptor per line. The name of the input to process and output file to create are given in the command line.
/*
* GenerCF.java
*
* Created on Nov 14, 2003, 9:22 AM
*/
import java.io.*;
import chemaxon.descriptors.*;
/**
* Simple example code to demostrate the calculation of ChemicalFingerprint.
*
* Takes three command-line arguments as input: input SDfile,
* output descriptor file
*
* @author Miklos Vargyas
*/
public class GenerCF {
public static void main( String[] args ) {
try {
// create a molecular descriptor generator for one descriptor
GenerateMD descript = new GenerateMD(1);
// set the input file for the generator
descript.setInput( args[ 0 ] );
// and tell that input is an SDfile
descript.setSDfileInput(true);
// specify the descriptor to be created: into which file, what type
// what parameters settings to be applied
descript.setDescriptor( args[ 1 ], "CF", new CFParameters(), "" );
// initialize the generator
descript.init();
// start it and do the entire generation in one go
descript.run();
// close output file
descript.close();
}
catch ( Exception e ) {
e.printStackTrace( System.err );
}
System.exit(0);
}
}
2.2 Dissimilarity calculation
The next example is more complex as it demonstrates two different aspects of virtual screening at the same time. The key point here is that molecular descriptors are used for dissimilarity calculation. Beside of this, descriptors are taken from a JChem database table. Such table, the so called descriptor table can be generated prior to calling this smple program by running generatemd with the appropriate parameters (see relevant example in the [ GenerateMD |SCR:GenerateMD] documentation).
The program solves a rather simple but demonstrative task: calculates the average dissimilarity of a given query structure and all structures stored in the database (with respect to the particular descriptor type used and dissimilarity metric applied).
This code can simply be expanded to calculate the total dissimilarity score of a compound library.
/*
* ComparePairwise.java
*
* Created on Nov 14, 2003, 19:04 AM
*/
import java.sql.*;
import java.io.*;
import java.util.Properties;
import chemaxon.descriptors.*;
import chemaxon.struc.Molecule;
import chemaxon.formats.MolImporter;
import chemaxon.util.ConnectionHandler;
import chemaxon.jchem.db.SettingsHandler;
import chemaxon.jchem.db.MDTableHandler;
/**
* Demonstrates how molecular descriptors stored in a database table can be
* used in dissimilarity calculations. This simple program calculates the
* average dissimilarity of a compound library against a user defined query
* structure.
* The application takes three parameters: the name of the JChem structure
* table, the name of the molecular descriptor and the query molecule (e.g.
* as a smiles string, but filename is not accepted here). Note, that the
* name of the molecular descriptor is a user given name when the
* {color:green}generatemd{color} command was executed.
* Be aware that database connection parameters are taken from the .jchem file
* (i.e. settings used the last time when jcman was used).
*/
public class ComparePairwise {
public static void main(String[] args) {
String strucTableName = args[ 0 ];
String sqlSelect = "select cd_id from " + strucTableName;
// names of molecular descriptors used in this dissimilarity calculation
// this is the name of the descriptor you gave when the descriptor
// (e.g. fingerprint) was generated with the generatemd command
String[] mdNames = new String[ 1 ];
mdNames[ 0 ] = args[ 1 ];
String query = args[ 2 ];
float dissim = 0.0F;
int counter = 0;
try {
// open a database connection using settings stored in the .jchem file
ConnectionHandler connHandler = new ConnectionHandler();
connHandler.loadValuesFromProperties(
new Properties( new SettingsHandler().getSettings() ) );
connHandler.connect();
// mdTableHandler allows the retrieval of descriptors from database
MDTableHandler mdTableHandler =
new MDTableHandler( connHandler, strucTableName );
// dbReader gets fingerprints through mdTableHandler
MDDBReader dbReader = new MDDBReader( strucTableName, connHandler,
mdNames, sqlSelect );
// get the first molecular descriptor set from the descriptor table
// note, that now the set has one component only
MDSet md = dbReader.next();
// create an identical descriptor set using the same parameters
// this will store the molecular descriptor of the query molecule
MDSet queryDescr = new MDSet( md );
// create a Molecule object from the query string
Molecule queryMol = MolImporter.importMol( query );
// generate descriptor for the query molecule using the same
// settings as found in the database
queryDescr.generate( queryMol );
// iterate through all descriptors and sum dissimilarities
while ( md != null ) {
dissim += md.getDissimilarity( queryDescr );
md = dbReader.next();
counter++;
}
dissim = dissim / counter;
System.out.println( "Number of descriptors retrieved = " + counter );
System.out.println( "Average dissimilarity from " + query + " = "
+ dissim );
}
catch( Exception ex ) {
ex.printStackTrace();
}
}
}
3 Advanced API usage
More advanced usage of the Screen API includes the simultaneous use of several descriptors, the use of the Metrics class and the fine tuning of dissimilarity metrics. These examples will be given in future releases of JChem.
4 Custom descriptor implementation
The Screen package provides a framework for the descriptor/fingerprint generation, storage and retrieval, for similarity/dissimilarity calculations, for virtual screening and for the fine-tuning of dissimilarity scoring functions. As a framework, it does not limit the applicability of tools to the pre-existing molecular descriptors and dissimilarity metrics delivered by ChemAxon. The user can implement custom descriptors that can be integrated in the Screen system in a plug-and-play fashion.
The sample code in this section illustrates how custom molecular descriptors can be implemented using ChemAxon's technology. The example is a partial implementation of the 166 public MDL keys (MACCS). It has to be noted, that for the sake of easy understanding efficiency was not targeted in this program. A 'real life' application should take more care about faster and parallel recognition of functional groups for the sake of fast operation.
When generation custom descriptors in the Screen framework, 3 java classes have to be implemented:
-
the generator class, derived from the MDGenerator class,
-
the descriptor parameter class, derived from the MDParameter class,
-
the molecular descriptor class, derived from the MolecularDescriptor class.
Convenience classes have also been introduced to alleviate the coding work. Examples below derive the MACCS descriptor class as well as the corresponding parameter class from these convenience classes. These classes suit most typical needs, it is seldom needed to inherit from lower level classes.
4.1 Descriptor generator class
The main function of the descriptor generator class is to assign a molecular descriptor to the givne input molecule. Beside its constructor, the only method to be implemented is generate() . This takes two parameters, the input Molecule and the output MolecularDescriptor generated. Note, that the return value, a String array does not store the descriptor. Instead, it contains the names of properties optionally set in the input molecule. These can include partial results of the descriptor calculation that are believed to be useful and thus kept for later use. This return value is optional, most descriptor generators return null . However, if properties are set by the generator, then those are written in the output SDFile, if an SDF output was specified (e.g. in generatemd ). This feature can be used for testing purposes.
/*
* MaccsGenerator.java
*/
import chemaxon.descriptors.*;
import chemaxon.util.MolHandler;
import chemaxon.sss.search.MolSearch;
import chemaxon.struc.Molecule;
import chemaxon.calculations.ElementalAnalyser;
import chemaxon.marvin.modules.Aromata;
import chemaxon.formats.MolFormatException;
import chemaxon.sss.search.SearchException;
// !!!!!!!!!!!!!!!!!!!!!!!!!
// you will need to work on your generator class, but hopefully this gives you
// some ideas
// !!!!!!!!!!!!!!!!!!!!!!!!!
/**
* Generator class for the <code>Maccs</code> descriptor. A partial
* implementation the 166 public MDL keys is given here. This class serves
* demonstration purposes only.
*
* @author Miklos Vargyas
*/
public class MaccsGenerator extends MDGenerator {
/** performs substructure search */
private MolSearch search = null;
/** converts SMART queries into <code>Molecule</code> objects */
private MolHandler smartsReader = null;
/** performs elemental analisys of target molecules */
private ElementalAnalyser elemAnal = null;
/** aromatizes target molecules and gathers ring information */
private Aromata arom = null;
/**
* Creates and initializes a <code>Maccs</code> descriptor generator object.
* One such object can be re-used to generate multiple descriptors
* consecutively, there is no need to create one <code>MaccsGenerator</code>
* instance for each <code>Molecule</code> object.
*/
public MaccsGenerator() {
search = new MolSearch();
smartsReader = new MolHandler();
smartsReader.setQueryMode( true );
elemAnal = new ElementalAnalyser();
arom = new Aromata();
}
/**
* Generates the Maccs descriptors for the given molecule. New instance of
* the <code>Maccs</code> object is not allocated, the
* <code>MolecularDescriptor</code> provided as a parameter is updated
* (thus it has to be allocated and initialized by the client of this
* class).
* @param m molecule for which the Maccs descriptor is created
* @param d the Maccs descriptor generated
* @return always null in the case of <code>Maccs</code>
*/
public String[] generate( Molecule m, MolecularDescriptor d )
throws MDGeneratorException {
MaccsParameters params = (MaccsParameters)d.getParameters();
Maccs Maccs = (Maccs)d;
arom.setMol( m );
arom.aromatize();
elemAnal.setMolecule( m );
search.setTarget( m );
// !!!!!!!!!!!!!!!!
// apparently, this is not the way I would implement the generator in a
// real application, but for demonstration/tutorial purposes I thought
// this is the easist to understand and modify/customize
// it is o.k., because it is clear, but it is not efficient enough; the
// efficient implementation would be quite hard to understand and very hard
// to modify
// !!!!!!!!!!!!!!!!
if ( genKey11() ) Maccs.setKey( 0 );
if ( genKey13() ) Maccs.setKey( 1 );
if ( genKey14() ) Maccs.setKey( 2 );
if ( genKey15() ) Maccs.setKey( 3 );
if ( genKey17() ) Maccs.setKey( 4 );
if ( genKey19() ) Maccs.setKey( 5 );
if ( genKey20() ) Maccs.setKey( 6 );
if ( genKey21() ) Maccs.setKey( 7 );
if ( genKey22() ) Maccs.setKey( 7 );
if ( genKey23() ) Maccs.setKey( 8 );
if ( genKey24() ) Maccs.setKey( 9 );
if ( genKey25() ) Maccs.setKey( 10 );
if ( genKey27() ) Maccs.setKey( 11 );
if ( genKey28() ) Maccs.setKey( 12 );
if ( genKey29() ) Maccs.setKey( 13 );
if ( genKey30() ) Maccs.setKey( 14 );
if ( genKey32() ) Maccs.setKey( 15 );
if ( genKey33() ) Maccs.setKey( 16 );
if ( genKey37() ) Maccs.setKey( 17 );
if ( genKey38() ) Maccs.setKey( 18 );
if ( genKey39() ) Maccs.setKey( 19 );
if ( genKey40() ) Maccs.setKey( 20 );
if ( genKey41() ) Maccs.setKey( 21 );
if ( genKey42() ) Maccs.setKey( 22 );
if ( genKey45() ) Maccs.setKey( 23 );
if ( genKey50() ) Maccs.setKey( 24 );
if ( genKey60() ) Maccs.setKey( 25 );
if ( genKey63() ) Maccs.setKey( 26 );
if ( genKey78() ) Maccs.setKey( 27 );
if ( genKey84() ) Maccs.setKey( 28 );
if ( genKey88() ) Maccs.setKey( 29 );
if ( genKey96() ) Maccs.setKey( 30 );
if ( genKey99() ) Maccs.setKey( 31 );
if ( genKey101() ) Maccs.setKey( 32 );
if ( genKey103() ) Maccs.setKey( 33 );
if ( genKey118() ) Maccs.setKey( 34 );
if ( genKey125() ) Maccs.setKey( 35 );
if ( genKey130() ) Maccs.setKey( 36 );
if ( genKey131() ) Maccs.setKey( 37 );
if ( genKey134() ) Maccs.setKey( 38 );
if ( genKey139() ) Maccs.setKey( 39 );
if ( genKey140() ) Maccs.setKey( 40 );
if ( genKey142() ) Maccs.setKey( 41 );
if ( genKey146() ) Maccs.setKey( 42 );
if ( genKey149() ) Maccs.setKey( 43 );
if ( genKey151() ) Maccs.setKey( 44 );
if ( genKey154() ) Maccs.setKey( 45 );
if ( genKey157() ) Maccs.setKey( 46 );
if ( genKey158() ) Maccs.setKey( 47 );
if ( genKey159() ) Maccs.setKey( 48 );
if ( genKey160() ) Maccs.setKey( 49 );
if ( genKey161() ) Maccs.setKey( 50 );
if ( genKey163() ) Maccs.setKey( 51 );
if ( genKey164() ) Maccs.setKey( 52 );
if ( genKey165() ) Maccs.setKey( 53 );
return null;
}
// !!!!!!!!!!!!!!!!!!!!!
// queires here are not in MDL's query format as we cannot parse that, instead
// SMARTS (query smiles) are used
// At the moment there is no guarantee that these queries have exactly the same
// meaning as original MDL ones - we'll work opn that later
// !!!!!!!!!!!!!!!!!!!!!
private boolean genKey11() { return isRing( 4 ); }
private boolean genKey13() { return isMatching( "[#8]~[#7](~[#6])~[#6]" ); }
private boolean genKey14() { return isMatching( "S-S" ); }
private boolean genKey15() { return isMatching( "[#6]~[#6](~[#8])~[#8]" ); }
private boolean genKey17() { return isMatching( "[#6]#[#6]" ); }
private boolean genKey19() { return isRing( 7 ); }
private boolean genKey20() { return elemAnal.atomCount( 14 ) > 0; } /* Si */
private boolean genKey21() { return isMatching( "[#6]=[#6](~[!#1!#6])[!#1!#6]" ); }
private boolean genKey22() { return isRing( 3 ); }
private boolean genKey23() { return isMatching( "[#7]~[#6](~[#8])~[#8]" ); }
private boolean genKey24() { return isMatching( "[#7]-[#8]" ); }
private boolean genKey25() { return isMatching( "[#7]~[#6](~[#7])~[#7]" ); }
private boolean genKey27() { return elemAnal.atomCount( 53 ) > 0; } /* I */
private boolean genKey28() { return isMatching( "[!#1!#6][CH2][!#1!#6]" ); }
private boolean genKey29() { return elemAnal.atomCount( 15 ) > 0; } /* P */
private boolean genKey30() { return isMatching( "[#6]~[!#1!#6](~[#6])(~[#6])~*" ); }
private boolean genKey32() { return isMatching( "[#6]~S~[#7]" ); }
private boolean genKey33() { return isMatching( "[#7]~S" ); }
private boolean genKey37() { return isMatching( "[#7]~[#6](~[#8])~[#7]" ); }
private boolean genKey38() { return isMatching( "[#7]~[#6](~[#6])~[#7]" ); }
private boolean genKey39() { return isMatching( "[#8]~S(~[#8])~[#8]" ); }
private boolean genKey40() { return isMatching( "S-[#8]" ); }
private boolean genKey41() { return isMatching( "[#6]#[#7]" ); }
private boolean genKey42() { return elemAnal.atomCount( 9 ) > 0; } /* F */
private boolean genKey45() { return isMatching( "[#6]=[#6]~[#7]" ); }
private boolean genKey50() { return isMatching( "[#6]=[#6](~[#6])~[#6]" ); }
private boolean genKey60() { return isMatching( "S=[#8]" ); }
private boolean genKey63() { return isMatching( "[#7]=[#8]" ); }
private boolean genKey78() { return isMatching( "[#6]=[#7]" ); }
private boolean genKey84() { return isMatching( "[#7H2]" ); }
private boolean genKey88() { return elemAnal.atomCount( 16 ) > 0; } /* S */
private boolean genKey96() { return isRing( 5 ); }
private boolean genKey99() { return isMatching( "[#6]=[#6]" ); }
private boolean genKey101() { return isLargerRing( 95 ); }
private boolean genKey103() { return elemAnal.atomCount( 17 ) > 0; } /* [#6]l */
private boolean genKey118() { return isMore( "*~[CH2]~[CH2]~*" ); }
private boolean genKey125() { return arom.getAromRings().length > 1; }
private boolean genKey130() { return isMore( "[!#1!#6]~[!#1!#6]" ); }
private boolean genKey131() { return isMore( "[!#1!#6]~[H]" ); }
private boolean genKey134() { return isMatching( "[F,Cl,Br,I]" ); }
private boolean genKey139() { return isMatching( "[#8][H]" ); }
private boolean genKey140() { return elemAnal.atomCount( 8 ) > 3; }
private boolean genKey142() { return elemAnal.atomCount( 7 ) > 1; }
private boolean genKey146() { return elemAnal.atomCount( 8 ) > 2; }
private boolean genKey149() { return isMore( "[CH3]" ); }
private boolean genKey151() { return isMatching( "[#7][H]" ); }
private boolean genKey154() { return isMatching( "[#6]=[#8]" ); }
private boolean genKey157() { return isMatching( "[#6]-[#8]" ); }
private boolean genKey158() { return isMatching( "[#6]-[#7]" ); }
private boolean genKey159() { return elemAnal.atomCount( 8 ) > 1; }
private boolean genKey160() { return isMatching( "[CH3]" ); }
private boolean genKey161() { return elemAnal.atomCount( 7 ) > 0; }
private boolean genKey163() { return isRing( 6 ); }
private boolean genKey164() { return elemAnal.atomCount( 8 ) > 0; }
private boolean genKey165() {
return arom.getAromRings().length > 0
|| arom.getNonAromRings().length > 0;
}
/**
* Checks if there is at least one rine of the given size in the target
* structure. Uses the aromatizer (<code>Aromata</code>) object that
* perceives all rings in the target molecule.
* @param ringSize size of ring searched for
*/
private boolean isRing( int ringSize ) {
int[][] aromRings = arom.getAromRings();
for ( int i = 0; i < aromRings.length; i++ ) {
if ( aromRings[ i ].length == ringSize ) {
return true;
}
}
int[][] aliphRings = arom.getNonAromRings();
for ( int i = 0; i < aliphRings.length; i++ ) {
if ( aliphRings[ i ].length == ringSize ) {
return true;
}
}
return false;
}
/**
* Checks if there is at least one rine of the given size or larger in the
* target structure. Uses the aromatizer (<code>Aromata</code>) object that
* perceives all rings in the target molecule.
* @param ringSize size of ring searched for
*/
private boolean isLargerRing( int ringSize ) {
int[][] aromRings = arom.getAromRings();
for ( int i = 0; i < aromRings.length; i++ ) {
if ( aromRings[ i ].length >= ringSize ) {
return true;
}
}
int[][] aliphRings = arom.getNonAromRings();
for ( int i = 0; i < aliphRings.length; i++ ) {
if ( aliphRings[ i ].length >= ringSize ) {
return true;
}
}
return false;
}
/**
* Performs substructure search to check if the given query (specified as
* DayLight SMARTS) is found in the target structure.
* @param smartsQuery query structure to be found
*/
private boolean isMatching( final String smartsQuery ) {
try {
smartsReader.setMolecule( smartsQuery );
search.setQuery( smartsReader.getMolecule() );
return search.isMatching();
}
catch ( MolFormatException me ) {
me.printStackTrace();
// normally this should be rethrown here
}
catch ( SearchException se ) {
se.printStackTrace();
// normally this should be rethrown here
}
return false;
}
/**
* Performs substructure search to check if the given query (specified as
* DayLight SMARTS) is found more than once in the target structure.
* @param smartsQuery query structure to be found
*/
private boolean isMore( final String smartsQuery ) {
try {
smartsReader.setMolecule( smartsQuery );
search.setQuery( smartsReader.getMolecule() );
int[][] hits = search.findAll();
return hits != null && hits.length > 1;
}
catch ( MolFormatException me ) {
me.printStackTrace();
// normally this should be rethrown here
}
catch ( SearchException se ) {
se.printStackTrace();
// normally this should be rethrown here
}
return false;
}
}
4.2 Descriptor parameter class
Most molecular descriptors can be parameterized, for instence the length is a fairly common parameter. The parameter class also introduces the metrics that are compatible with (available for) the new descriptor.
Descriptor parameters are stored in an XML file that can easily be extended according to future needs. However, compatibility with old versions has to be maintained.
The convenience class CDParameters (where CD stands for Custom Descriptor) covers almost all typical functionality needed to handle parameters, thus in most cases the parameter class is simply a wrapper for methods delegated by the CDParameters class.
/*
* MaccsParameters.java
*/
import chemaxon.descriptors.*;
import chemaxon.struc.Molecule;
import java.util.*;
import java.io.*;
import java.lang.IllegalArgumentException;
// !!!!!!!!!!!!!!!!!!!!!!
// you do not need to do anything else than replacing all occurances of the
// word 'Maccs' to your fingerprints name if you do not want to use external
// parameter configuration file
// !!!!!!!!!!!!!!!!!!!!!!
/**
* Manages MDL Maccs-II fingerprint parameters. As in the present implementation
* no external parameters are used (ie. everything is wired into the
* <code>MaccsGenerator</code> and <code>Maccs</code> classes this class does
* not play any important role. The reason why it is still implemented is to
* outline an appropriate framework for optional parameters required by other
* custom molecular descriptors.<br>
* The official implementation of MDL Maccs-II by ChemAxon will store all keys
* in an external XML configuration file, in which case this class will become
* important: it will process the XML file and store the definitions of keys.
*
* @author Miklos Vargyas
*/
public class MaccsParameters extends CDParameters {
/**
* length of the example Maccs keys (max number of bits)
*/
public final static int DEFAULT_LENGTH = 64;
/**
* Creates an empty object. Initializes parameters to default values.
*/
public MaccsParameters() {
super();
setLength( DEFAULT_LENGTH );
}
/**
* Creates a new object based on a given configuration file.
* @param configFile an open (XML) configuration file
* @throws MDParametersException missing or bad (XML) configuration
*/
public MaccsParameters(File configFile) throws MDParametersException
{
super( configFile );
}
/**
* Creates a new object based on a given configuration string.
* @param config (XML) configuration string
* @throws MDParametersException missing or bad (XML) configuration
*/
public MaccsParameters(String config) throws MDParametersException
{
super( config );
}
/**
* Get the default HTML document frame. This is needed
* @return default HTML document frame of the MaccsParameters class
*/
public String getDefaultDocumentFrame() {
return "<?xml version=\"1.0\" encoding=\"UTF-8\"?> \n" +
"<MDL-Maccs-II-ExampleConfiguration Version =\"0.1\" >\n" +
"<ScreeningConfiguration>\n" +
" <ParametrizedMetrics>\n" +
" <ParametrizedMetric Name=\"Tanimoto\" ActiveFamily=\"Generic\"\n" +
" Metric=\"Tanimoto\" Threshold=\"0.2\"/>\n" +
" </ParametrizedMetrics>\n" +
"</ScreeningConfiguration>\n" +
"</MDL-Maccs-II-ExampleConfiguration>\n";
}
/**
* Initializes the Maccs fingerprint generator.
*/
protected void initGenerator() throws MDParametersException {
generator = new MaccsGenerator();
}
/**
* This method is called by the constructors before processing the XML
* configuration. It creates a <code>ChemicalFingerprint</code> object stored in
* {@link chemaxon.descriptors.MDParameters#md MDParameters.md}.
*/
protected void init() {
md = new Maccs();
}
/**
* Calls <code>MaccsGenerator</code> and generates the descriptor for the
* given molecule.
* @param m a molecular structure
* @param md the molecular descriptor generated for the given molecule,
* an output parameter
* @return names of Molecule Property-s (SDfile tags) set by the generator
* @throws MDGeneratorException when failed to generate descriptor
*/
protected String[] generate( final Molecule m, MolecularDescriptor md )
throws MDGeneratorException {
return generator.generate( m, md );
}
}
4.3 The descriptor class
The main purpose of the descriptor class is to provide the connections for the plug-and-play interface, via its constructors and some miscellaneous methods like
getName()
.
This example code illustrates the use of binary fingerprint like descriptors, however, integer vector or floating point vector type descriptors can be implemented the same way (with the appropriate obvious changes).
If, however, the descriptor to be implemented is neither a binary fingerprint, nor and integer/float vector like descriptor, then the convenience classes cannot be used. In these are rather rare cases the implemetor of the new descriptor has lot more coding work to do. Our experts are commited to help such efforts via the relevant public forum .
/*
* Maccs.java
*/
import chemaxon.descriptors.*;
import chemaxon.struc.Molecule;
import java.util.Arrays;
import java.util.StringTokenizer;
import java.text.ParseException;
/**
* Implements MDL MACCS-II intDescr. This class serves demonstration purposes, thus
* only a portion of the original intDescr are implemented.
*
* @author Miklos Vargyas
*/
public class Maccs extends CustomDescriptor {
/**
* Creates a new, empty MACCS descriptor.
*/
public Maccs() {
super( CDParameters.BINARY_DESCRIPTOR, 64 );
}
/**
* Copy constructor. An identical copy of the <code>MACCS</code>
* passed is created. The old and the new instances share the same
* <code>MACCSParameters</code> object.
*
* @param md a MACCS descriptor to be copied
*/
public Maccs( final Maccs md ) {
super( md );
}
/**
* Create a new empty instances according to parameter configuration.
* @param params parameter settings
*/
public Maccs(final CDParameters params) {
super( params );
}
/**
* Creates a new instance according to parameters passed in a string.
*
* @param params parameter string
*/
public Maccs(final String params)
{
super( params );
}
/**
* Creates a copy with identical internal state. The new instances share the
* same <code>MACCSParameters</code> object with the copied one.
* @return the newly created object
*/
public Object clone() {
return new Maccs( this );
}
// !!!!!!!!!!!!!!!!!!!!!!
// you will need to change string costants here by simply replacing MACCS with
// the name of your fingerprint
/**
* Gets the nice name of the <code>MACCS</code> descriptor object. This name
* is not the same as the class name: it is nicer, and more meaningful for
* end-users.
* @return the nice, external name for MACCS descriptor class objects
*/
public String getName() {
return "MDL MACCS-II descriptor";
}
/**
* Gets the short name of the descriptor. This name appears in text outputs.
* @return the short name used in text outputs (tables etc.)
*/
public String getShortName() {
return "Maccs";
}
// and similarly, the name of your parameters class
/**
* Gets the name of the parameters class corresponding to the descriptor.
* @return the name of the parameters class
*/
public String getParametersClassName() {
return "MaccsParameters";
}
// !!!!!!!!!!!!!!!!!!!!!!
/**
* Sets the given cell (key) to one. Individual cells cannot be cleared
* (ie. set to zero), only the whole descriptor (see <code>clear()</code>).
* @param cellIndex index of the cell (key) to be set (to one)
*/
public void setKey( int cellIndex ) {
set( cellIndex, 1 );
}
/**
* Gets the value (0 or 1) of the given cell (key).
* @param cellIndex index of the cell (key) to be set (to one)
*/
public int getKey( int cellIndex ) {
return getBit( cellIndex );
}
/**
* Creates the MACCS descriptor for the given Molecule. Calls the generator
* created by the corresponding <code>MACCSParameters</code> class.
* @return property names set in the molecule during generation (zero in
* the case of this particular class)
* @throws MDGeneratorException when failed to generate descriptor
*/
public String[] generate( final Molecule m ) throws MDGeneratorException {
clear();
try {
String[] res = ( (MaccsParameters)params ).generate( m, this );
return res;
}
catch ( NullPointerException ne ) {
ne.printStackTrace();
// !!!!!!!!!!!!!!
// just replace MACCS as appropriate
throw new MDGeneratorException( "Something went wrong in MACCS generator." );
// !!!!!!!!!!!!!!
}
}
}
5 Closing remarks
Users are encouraged to contribute their custom descriptor implementations to our public discussion forum , see for instance Florian Pitschi's work.