Sphere Exclusion clustering

This manual describes the Sphere Exclusion clustering algorithm:

Introduction

Sphere Exclusion is a simple, intuitive selection method. Clustering begins by selecting an initial structure, including all structures that meet a defined similarity threshold in the first cluster, and repeating this process until all structures are in clusters. A subsequent clustering is also done as cluster formation is dependent on the initial structure and parameter selection.The structure selection process can be random or directed by some preprocessing of the structures. Clusters are defined by similarity and their number is not predetermined. Sphere exclusion is a method to select subsets, e.g. diverse subsets. The method is highly dependent on the initial element of the input file.

images/www.chemaxon.com/jchem/doc/user/images/sphere1.jpg

Fig. 1 Sphere Exclusion clustering

Usage

You can invoke the Sphere Exclusion algorithm via the jklustor command line tool:

jklustor [<options>] [<input files>]

Prepare the usage of the jklustor script or batch file as described in Preparing and Running Batch Files and Shell Scripts.

Options

   sphex:[Minimal separation between cluster centroids]  Use single level sphere exclusion clustering
  -h, --help                    help message
-c, specify the clustering method
-o, --output <filepath> output file path (default: stdout)
-t, --tag name of the SDFile tag to store the
Pharmacophore Map (default: PMAP)
-S, --sdf-output SDF output (otherwise only PMAP list)
-g, --ignore-error continue with next molecule on error
-v, --verbose print calculation warnings to the console
-l, store individual input structures regardless of output actions
-s, --port after performing all output actions launch listening server on given port

Examples

The following examples demonstrate the usage of the Sphere Exclusion algorithm:

  • Invoke sphere exclusion clustering (using dissmilaity radius 0.4) on the given data set; store input structures and present results with builtin lightweight HTTP server. When clustering process finished connect browser to http://localhost:84.

    jklustor -v -l -s 81 -c sphex:0.4 http://www.chemaxon.com/shared/libMCS/default.sdf
  • Clustering with a 0.8 Tanimoto distance between centroids, and writes out each cluster with its members into different files:

    jklustor -c sphex:0.8 input .sdf -o "wrmols:sdf:cluster_*.sdf

For full user guide, type

jklustor -h.