Markush Overlap
Introduction
Markush Overlap is capable of calculating the overlapping chemical space between two Markush structures regardless of the complexity of the structures. The percentage of overlap and the Markush representation of the overlapping chemical space can be calculated. The query and target structures do not play symmetric role in Markush overlap calculation. The results are derived from the query structure and homology groups are only supported in the target.
Functions
getMarkushResult()
Returns a Markush representation of the overlapping space between the input structures. This Markush is derived from the query structure, but the required reductions are made and several other transformations are also performed. Certain Markush features are enumerated and the original R-groups may have converted into several different R-groups if it is required. This is an exact representation of the overlapping space, that is, its enumerated structures are exactly the common enumerated structures of the input Markush structures. However, it may contain redundant parts, some enumerated structures may be represented multiple times.
getFusedMarkushResult()
Returns a fused, approximate Markush representation of the overlapping space between the input structures. This Markush is derived from the query structure, but the required reductions are made and certain Markush features are also enumerated. This result is typically more clear, contains fewer duplications and is more similar to the query structure than getMarkushResult(), but it may not be an exact representation of the overlapping space. More precisely, it represents all common enumerated structures of the input Markush structures, but it may represent additional enumerated structures because different occurrences of the R-groups are not separated. That is, this representation fuses all usages of each original R-group of the query and preserves the original R-group numbering, while getMarkushResult() separates the different usages and may derive multiple R-groups from each original R-group of the query.
getEnumCountRatio()
Returns the approximate ratio of the overlapping space and the query Markush structure in terms of the number of enumerated structures. The returned value is always between 0 and 1, and it approximately describes the proportion of those enumerated structures of the query that are also enumerated structures of the target. More precisely, it is the ratio between the Markush library size of getFusedMarkushResult() and the query structure. Therefore, it is approximate the same way as getFusedMarkushResult(). Note that the query and target Markush structures do not play symmetric role in this calculation. For example, if all enumerated structures of the query are also represented by the target, then the ratio is 1, independently of how many non-common enumerated structures are represented by the target.
getFragmentCountRatio()
Returns the approximate ratio of the overlapping space and the query Markush structure in terms of the number of Markush definition fragments. The returned value is always between 0 and 1, and it approximately describes the proportion of Markush definition fragments of the query that are required to represent the overlapping space. More precisely, it is the ratio between the number of definition fragments in getFusedMarkushResult() and the query structure. Therefore, it is approximate the same way as getFusedMarkushResult(). Note that the query and target Markush structures do not play symmetric role in this calculation. For example, if all enumerated structures of the query are also represented by the target, then the ratio is 1, independently of how many non-common enumerated structures are represented by the target. In the case of complex Markush structures, this ratio might be more practical than getEnumCountRatio().
Examples
The following simple query-target pairs illustrating the results of the functions mentioned above with some typical Markush structures.
Query |
Target |
Result |
FusedResult |
ECR* |
FCR** |
|
|
|
|
1 |
1 |
|
|
|
|
0.25 |
0.5 |
|
|
|
|
0.5 |
0.5 |
|
|
|
|
1 |
1 |
*ECR EnumCountRatio
**FCR FragmentCountRatio
Markush Overlap can be easily used from KNIME. The following example illustrating how can you build a simple workflow using the Markush Overlap node.
You can download the example project from here Overlap.zip.