Installation and Administration
This manual serves as Administration Guide of JChem PostgreSQL Cartridge (JPC). Here you can find the description of software requirements, installation and configuration steps and running of JPC. See also Getting started guide for easy setup and use cases.
System requirement
Software
-
CentOS, RedHat, Debian or Ubuntu operating system x86_64 version.
Find information for cases of other linux systems
-
postgresql-94 relational database (preferred download page) for JPC versions up to 2.0
-
postgresql-95 relational database for JPC versions starting from 2.1
See help to upgrade of postgreSQL 9.4 to postgreSQL 9.5
-
availability of hstore extension of postgresql (e.g., contrib package)
-
Java8 runtime environment
Hardware
The memory need of JChem PostgreSQL Cartridge strongly depends on the format of the chemical structures to be stored. In the next table, the approximate memory need of 10 M PubChem molecules in SD file format and in SMILES format are compared. The presented memory setup is optimized for execution speed. Lower amount of memory may be used, in this case performance drop is expected. Further details can be found in the Performance tuning of searches in big tables section.
Indexed column containing 10 M PubChem molecules in |
Service jchem-psql [GB] |
Postgres shared buffer [GB] |
Total [GB] |
SD file format |
9 |
21 |
30 |
SMILES format |
9 |
2 |
11 |
Installation and Setup
Install PostgreSQL Cartridge
Download the latest version of PostgreSQL Cartridge from here. In order to install the latest version (as root), change the x.y (version information) in the following statement to the current one:
In CentOS:
sudo yum localinstall jchem-psql/jchem-psql-x.y.x86_64.rpm
In RedHat or Debian:
sudo dpkg -i jchem-psql-x.y.x86_64.deb
See also Getting started guide for PostgreSQL and Java8 installation.
Before the first use
-
Install your license file
Copy a valid ChemAxon license to /etc/chemaxon/license.cxl or set its location in /etc/chemaxon/jchem-psql.conf file. Required license in the license file is 'Postgres Cartridge'.
The jchem-psql user should have read access to the license file. -
Initialize index directory
sudo service jchem-psql init
If you encounter any problem in your java installation, set your JAVA_HOME in the file /etc/default/jchem-psql.
Initialization does not start the service, you must start it manually.
-
First start
sudo service jchem-psql manual-start
Create extension in a database
Each chemical database should have the following extensions created separately.
-
Create a new user and database
sudo su postgres
createuser testuser
createdb testdb -O testuser
-
Install the extensions (you must be postgres user or any other user who has create extension privilege)
psql testdb
testdb>
CREATE
EXTENSION chemaxon_type;
testdb>
CREATE
EXTENSION hstore;
testdb>
CREATE
EXTENSION chemaxon_framework;
If you experience any error during the CREATE EXTENSION steps, please check this page which gives hints relating non-standard PostgreSQL setup.
-
Checking JChem PostgreSQL Cartridge installation
If jchem-psql service is running, the following query can be executed without any problem:
testdb>
SELECT
'C'
::Molecule(
'sample'
) |<|
'CC'
::Molecule;
This select statement executes a substructure search, with a Carbon atom as the query and ethane as the target structure. You must receive true (t) as output. See details of Molecule type below.
Service
The service can be started/stopped using:
sudo service jchem-psql start
sudo service jchem-psql stop
ENABLED=1 (default) setting in /etc/default/jchem-psql file makes the service start on system boot and start manually by start.
If you want the service not to start on system boot, set ENABLED=0 in the /etc/default/jchem-psql file. In that case, manual-start must be run.
sudo service jchem-psql manual-start
Configure JChem-psql server
The following configuration parameters are available on PostgreSQL side:
Parameter name |
Default value |
Description |
chemaxon.hit_retrieval_batch_size |
5000 |
Number of hits in a batch between jchem-psql server and postgresql database. Higher value generates higher initial latency, lower value creates communication overhead. |
chemaxon.index_creation_batch_size |
5000 (up to version 2.6) 25000 (from version 2.7) |
Number of records in a batch between jchem-psql server and postgresql database during index creation. Higher value generates higher memory footprint, but better throughput for traditional hard disk. Cancelling the index creation is slower in case of higher values. |
chemaxon.search_wall_time_limit |
600000 (10 minutes) |
The maximum wall time available for a search operation in milliseconds. |
shared_buffers |
(typically) 128MB |
Sets the amount of memory the database server uses for shared memory buffers. This parameter can be set for performance tuning in big tables . Requires restart of postgres service. |
They - with the exception of shared_buffers - can be edited as an ordinary Postgres configuration parameter:
-
in the postgresql.conf file (e.g., /etc/postgresql/9.4/main/postgresql.conf)
-
can be overwritten in the current session:
SET chemaxon.hit_retrieval_batch_size to
1000
;
-
to see the current value:
SHOW chemaxon.hit_retrieval_batch_size;
The following configuration parameters are available on JChem-psql server side:
Parameter name |
Description |
com.chemaxon.jchem.psql.idx.cachedObjectCount |
Number of fingerprints in cache. All of them are needed at every search, so it is advised always to keep them in memory for the frequently used tables. |
com.chemaxon.jchem.psql.main.cachedObjectCount |
Number of molecules in cache. Only the screened objects are searched for, so this cache can be left at a small size. Only the number of the objects can be set, so the total size of used memory depends on the current structures and can vary from time to time. |
com.chemaxon.jchem.psql.env.indexDir |
Directory storing the indexes needed by the cartridge. |
com.chemaxon.jchem.psql.env.userTypeDir |
Directory storing the molecule type files. |
com.chemaxon.jchem.psql.runtime.maxOpenSessionCount |
Limit on the number of open service sessions at same time (default is 100). Each PSQL command which uses jchem-psql service opens a new service session and closes it at the end. Service session requests above the set value of the open sessions are rejected with a message. Available from version 2.0. |
These parameters can be configured in the /etc/chemaxon/jchem-psql.conf file.
Changing these parameters except maxOpenSessionCount requires new initialization of jchem-psql service. Please note that this will purge all jchem-psql index data hence the indexes need to be dropped and recreated again.
Molecule types
Molecule types define the interpretation mode of the chemical structures. Molecule types are based on the extension chemaxon_type. The definition of these types has to be stored in /etc/chemaxon/types/ folder as a <molecule_type_name>.type file. You can add, modify, delete molecule type files according to your needs.
We provide a sample type, see file /etc/chemaxon/types/sample.type.
The following settings can be defined in a <molecule_type_name>.type file:
-
Version of the type descriptor (at the moment, only '1' is accepted)
-
Type ID: positive unique identifier among types
-
Tautomer mode: OFF, GENERIC
-
Standardizer action string or Standardizer file containing standardization requirements
Standardizer action string must follow the syntax of command line standardizer actions:
Example:
aromatize:basic..addexplicitH..replaceatoms:queryatom=
'C'
:replaceatom=
'N'
Standardizer configuration file can be created as described in Creating a Configuration page.
The types stored in /etc/chemaxon/types/ are loaded when the jchem-psql service is initialized. Therefore, if you add a new molecule type or change an existing one, the following steps have to be executed (using the current version of JChem Postgres Cartridge):
-
Create/modify/delete molecule type file(s) according to needs
-
Stop the service: sudo service jchem-psql stop
-
Delete the content of the index directory: sudo rm -rf /var/lib/jchem-psql/store/*
-
Initialize the service: sudo service jchem-psql init
-
Start the service: sudo service jchem-psql manual-start
-
Change to postgres user: sudo su postgres
-
Login to your database: psql
-
Reindex the database: reindex database <database name>
The query structure and the target structure(s) must always have the same molecule type. There is an auto-casting implemented; this means, that in the case of comparing two molecules, it is enough to define the molecule type of either of them. The following three statements have the same meaning:
SELECT 'C'::Molecule('sample') |<| 'CC'::Molecule('sample');
SELECT 'C'::Molecule('sample') |<| 'CC'::Molecule;
SELECT 'C'::Molecule |<| 'CC'::Molecule('sample');
Invalid select statements
-
missing molecule type definition
SELECT 'C'::Molecule |<| 'CC'::Molecule;
-
different molecule type definition
SELECT 'C'::Molecule('type1') |<| 'CC'::Molecule('type2');
In the case of database search, the molecule type of the structure column relates to the query structure as well.
Upgrade
Upgrade with keeping existing molecule type data
If not only JChem PostgreSQL Cartridge, but postgreSQL database upgrade is also needed ( e. g., from JPC 2.1 postgreSQL 9.5 is required), find an example of how to upgrade of postgreSQL 9.4 to postgreSQL 9.5.
-
Stop the service
sudo service jchem-psql stop
-
Install the new version
-
Initialize the service (this step deletes the old indexes)
sudo service jchem-psql init
Only the index contents are deleted, the molecule records stored in the PostgreSQL database stay untouched.
-
Start the service
sudo service jchem-psql manual-start
-
Upgrade your existing postgresql database
-
Update the extensions
sudo su postgres
psql
# ALTER EXTENSION chemaxon_type UPDATE;
# ALTER EXTENSION chemaxon_framework UPDATE;
In some cases of the upgrade process, you have to drop the indexes before altering the chemaxon_framework extension. If you receive an error message like
"ERROR: cannot drop operator class chemindex_int_ops for access method chemindex because other objects depend on it
DETAIL: index upg_ind depends on operator class chemindex_int_ops for access method chemindex
HINT: Use DROP ... CASCADE to drop the dependent objects too.",
you have to drop the index referred in the DETAIL part of the message and repeat step
# ALTER EXTENSION chemaxon_framework UPDATE;
-
Reindex the database
Upgrade to 2.7 or above
When you upgrade to JPC 2.7 or above and want to take the advantage of performance improvements implemented in 2.7, you have to update the tables storing chemical structures. The indexes should be dropped before the upgrade and recreated thereafter.
update table_name set structure_column_name = structure_column_name::text::molecule(
'molecule_type_name'
);
Example:
drop
index
myindex;
update
mytable
set
mol = mol::text::molecule(
'sample'
);
create
index
myindex
on
mytable using chemindex(mol);
//
or
create
index
myindex
on
mytable using sortedchemindex(mol);
If the above table update is skipped, reindex database is necessary:
psql
# reindex database <database name>;
If you miss this step, all indexes of type chemindex will not function properly.
-
Upgrade without keeping existing molecule type data
If not only JChem PostgreSQL Cartridge, but postgreSQL database upgrade is also needed ( e. g., from JPC 2.1 postgreSQL 9.5 is required), find an example of how to upgrade of postgreSQL 9.4 to postgreSQL 9.5.
-
Before installing the new version, drop the following extensions
DROP
EXTENSION chemaxon_framework
CASCADE
;
DROP
EXTENSION chemaxon_type
CASCADE
;
If you drop the chemaxon_type extension, it will implicitly drop all chemical columns.
-
Stop the service
sudo service jchem-psql stop
-
Delete the content of the index directory
sudo rm -rf /var/lib/jchem-psql/store/*
-
Install the new version
Uninstall
The present chemaxon_framework extension must be dropped:
DROP
EXTENSION chemaxon_framework;
DROP
EXTENSION chemaxon_type;
If you drop the chemaxon_type type extension, it will implicitly drop all chemical columns.
The installed jchem-psql-x.y.x86_64 package can be removed.
Archiving
To archive the content of the following directories might be useful:
-
/var/lib/jchem-psql
This directory contains the indexing data. If this folder is restored, the same indexes will be usable again without reindexing.
-
/var/log/jchem-psql
This directory contains the log files. If jchem-psql server is restarted, the old log files will be removed. So it may be important to save these files before restarting the service to find out why it stopped.
-
/etc/chemaxon
This directory contains the user settings, e.g., custom molecule types or custom memory settings.