Deprecated API
This documentation page collects the deprecated and removed methods of JChem PostgreSQL Cartridge.
Removals
Fingerprint generation
Fingerprints can be generated using the following function:
fingerprint(chemical_structure,size)
or
fingerprint(structure_column_name,
size
)
where size is the number of bits in the fingerprint bit string; it must be divisible by 64, e.g.: 512.
The generated fingerprints refer to the original, non-standardized molecules even if standardizer configuration is defined for the given molecule type column.
Example:
SELECT
fingerprint(
'C1CCCCC1'
,512);
Smilarity search - old methods
For similarity searches with the old methods - which are supported before version 2.5 - you must have a column filled with fingerprints. These fingerprints correspond to molecules in another table (old method 1) or to molecules in the same table (old method 2). We strongly advise using the other table method (old method 1) because of the performance. The fingerprint data are usually small and PostgreSQL database engine can store them in memory.
Fingerprints can be generated as described above.
It is advised to name the fingerprint column as fp - in order to make similarity search easily runnable from applications based on JChem PostgreSLQ Cartridge API.
The following statements should be used for running similarity search using old method 1:
CREATE
TABLE
molecule_table_name(structure_column_name MOLECULE(
'molecule_type_name'
), id
INTEGER
);
CREATE
INDEX
molecule_table_idx
ON
molecule_table_name(id);
CREATE
TABLE
fingerprint_table_name
AS
SELECT
fingerprint(structure_column_name,512) fp, id
FROM
molecule_table_name;
CREATE
INDEX
fingerprint_idx
ON
fingerprint_table_name(fp);
CREATE
INDEX
fingerprint_id_idx
ON
fingerprint_table_name(id);
SELECT
*
FROM
fingerprint_table_name
WHERE
tanimoto(fingerprint(query_structure,512),fp) operator similarity_value;
SELECT
*
FROM
(
SELECT
ft.*, tanimoto(fingerprint(query_structure,512),fp)
AS
tanimoto
FROM
fingerprint_table_name
AS
ft)
WHERE
tanimoto operator similarity_value
ORDER
by
tanimoto
DESC
;
where
operator can be <, <=, =, >, >=
similarity_value is a number between 0 and 1
fp is the recommended name of the fingerprint column
Example:
CREATE TABLE moltable(mol MOLECULE(
'sample'
), id INTEGER);
CREATE INDEX moltable_idx ON moltable(id);
Insert molecules into moltable and/or create a trigger, then continue with the followings:
CREATE
TABLE
fptable
AS
SELECT
fingerprint(mol,512) fp, id
FROM
moltable;
CREATE
INDEX
fptable_idx
ON
fptable(fp);
CREATE
INDEX
fptable_id_idx
ON
fptable(id);
SELECT
*
FROM
fptable
WHERE
tanimoto(fingerprint(
'CCC'
,512),fp) > 0.9;
--to get the results in descending order by similarity
SELECT
t.id,t.tanimoto, moltable.mol
FROM
(
SELECT
fptable.*, tanimoto(fingerprint(
'CCC'
,512),fp)
AS
tanimoto
FROM
fptable)
AS
t, moltable
WHERE
t.id = moltable.id
AND
tanimoto > 0.9
ORDER
BY
tanimoto
DESC
;
The following statements should be used for running similarity search using old method 2:
CREATE
TABLE
table_name(structure_column_name MOLECULE(
'molecule_type_name'
));
ALTER
TABLE
table_name
ADD
COLUMN
fp BYTEA;
UPDATE
table_name
SET
fp = fingerprint(structure_column_name,512);
SELECT
*
FROM
table_name
WHERE
tanimoto(fingerprint(query_structure,512),fp) operator similarity_value;
Example:
CREATE
TABLE
simtable(mol MOLECULE(
'sample'
));
ALTER
TABLE
simtable
ADD
COLUMN
fp BYTEA;
UPDATE
simtable
SET
fp=fingerprint(mol,512);
SELECT
*
FROM
simtable
WHERE
tanimoto(fingerprint(
'CCC'
,512),fp) > 0.9;
Please note that fingerprint values present in fingerprint columns must be recalculated when molecules are updated.
Known Issue
Update of tables with more than about 1000 records might be very slow.
You can create a trigger to update the fingerprint column in the fingerprint table (fptable) when new records are inserted into the molecule table (moltable).
Example:
CREATE TABLE moltable(mol MOLECULE(
'sample'
), id INTEGER);
CREATE TABLE fptable AS SELECT fingerprint(mol,
512
) fp, id FROM moltable;
CREATE OR REPLACE FUNCTION set_fingerprint()
RETURNS trigger AS
$BODY$
BEGIN
IF (TG_OP =
'DELETE'
) THEN
DELETE FROM fptable WHERE id = OLD.id;
RETURN OLD;
END IF;
IF (TG_OP =
'UPDATE'
) THEN
IF NEW.mol is
null
THEN
UPDATE fptable SET id = NEW.id, fp =
null
where id = OLD.id;
END IF;
UPDATE fptable SET id = NEW.id, fp = fingerprint(NEW.mol,
512
) where id = OLD.id;
RETURN NEW;
END IF;
IF (TG_OP =
'INSERT'
) THEN
IF NEW.mol is
null
THEN
INSERT INTO fptable (fp, id) VALUES (
null
,NEW.id);
END IF;
INSERT INTO fptable (fp, id) VALUES (fingerprint(NEW.mol,
512
),NEW.id);
RETURN NEW;
END IF;
END;
$BODY$
LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS tr_fingerprint ON moltable;
CREATE TRIGGER tr_fingerprint BEFORE INSERT OR UPDATE OR DELETE ON moltable
FOR EACH ROW EXECUTE PROCEDURE set_fingerprint();
Limitations:
-
Query structures with query features (like list atoms, query atoms, query bonds, ... ) are not supported.
Relevance sorting by using relevance function
Available from version 1.8 and deprecated in version 2.6
As in standard SQL, the user can order his results using ORDER BY commands.
For ordering search results, JChem PostgreSQL function relevance(Molecule) is provided, which gives back a numeric type value based on the atom counts and further topological features of the molecule.
It is suggested that relevance values be stored in the table for further query. It is also suggested that an index is created on the relevance column and further queries return their results ordered by the relevance value. This facilitates the usage of LIMIT <n> conditions as the most relevant hits are at the beginning of the result set. If the relevance column is created upon table creation or import the addition of the chemical index should also occur after adding the relevance column .
ALTER
TABLE
<mytable>
ADD
COLUMN
<relevance_column>
INT
;
UPDATE
<mytable>
SET
<relevance_column> = relevance(mol)::
int
;
CREATE
INDEX
<relevance_index>
on
<mytable>(<relevance_column>);
SELECT
mol
FROM
<mytable>
WHERE
'query_structure'
|<| mol
ORDER
BY
<relevance_column> LIMIT <n>;
Example (assuming table "test" has column "mol" of type Molecule):
ALTER
TABLE
test
ADD
COLUMN
relev
INT
;
UPDATE
TEST
SET
relev = relevance(mol)::
int
;
CREATE
INDEX
relev_idx
on
test(relev);
SELECT
mol
FROM
test
WHERE
'c1ccccc1'
|<| mol
ORDER
BY
relev LIMIT 100;
Other chemical features/measures may also be used for ordering, the chemterm function provides help for their definition.
You can create a trigger to update the relevance column when new records are inserted into the table.
Example:
CREATE TABLE test (mol Molecule(
'sample'
));
ALTER TABLE test ADD COLUMN relev INT;
CREATE OR REPLACE FUNCTION set_relevance()
RETURNS trigger AS
$BODY$
BEGIN
IF NEW.mol is
null
THEN
NEW.relev:=NULL;
END IF;
NEW.relev:=relevance(NEW.mol)::
int
;
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS tr_relevance ON test;
CREATE TRIGGER tr_relevance BEFORE INSERT OR UPDATE ON test
FOR EACH ROW EXECUTE PROCEDURE set_relevance();
UPDATE test SET relev=relevance(mol)::
int
;