Sphere Exclusion clustering
This manual describes the Sphere Exclusion clustering algorithm:
Introduction
Sphere Exclusion is a simple, intuitive selection method. Clustering begins by selecting an initial structure, including all structures that meet a defined similarity threshold in the first cluster, and repeating this process until all structures are in clusters. A subsequent clustering is also done as cluster formation is dependent on the initial structure and parameter selection.The structure selection process can be random or directed by some preprocessing of the structures. Clusters are defined by similarity and their number is not predetermined. Sphere exclusion is a method to select subsets, e.g. diverse subsets. The method is highly dependent on the initial element of the input file.
Fig. 1 Sphere Exclusion clustering
Usage
You can invoke the Sphere Exclusion algorithm via the jklustor command line tool:
jklustor [<options>] [<input files>]
Prepare the usage of the jklustor script or batch file as described in Preparing and Running Batch Files and Shell Scripts.
Options
sphex:[Minimal separation between cluster centroids] Use single level sphere exclusion clustering
-h, --help help message
-c, specify the clustering method
-o, --output <filepath> output file path (default: stdout)
-t, --tag name of the SDFile tag to store the
Pharmacophore Map (default: PMAP)
-S, --sdf-output SDF output (otherwise only PMAP list)
-g, --ignore-error continue with next molecule on error
-v, --verbose print calculation warnings to the console
-l, store individual input structures regardless of output actions
-s, --port after performing all output actions launch listening server on given port
Examples
The following examples demonstrate the usage of the Sphere Exclusion algorithm:
-
Invoke sphere exclusion clustering (using dissmilaity radius 0.4) on the given data set; store input structures and present results with builtin lightweight HTTP server. When clustering process finished connect browser to http://localhost:84.
jklustor -v -l -s 81 -c sphex:0.4 http://www.chemaxon.com/shared/libMCS/default.sdf
-
Clustering with a 0.8 Tanimoto distance between centroids, and writes out each cluster with its members into different files:
jklustor -c sphex:0.8 input .sdf -o "wrmols:sdf:cluster_*.sdf
For full user guide, type
jklustor -h.