JChem PostgreSQL Cartridge on a distributed PostgreSQL Citus database
    
 
     
    
The following software versions were used at testing:
-     JChem PostgreSQL Cartridge 2.1 
-     PostgreSQL 9.5 
-     Citus 5.1 
 
Setup
-     Install PostgreSQL and Citus on your master and worker nodes and create the Citus extension on each node (master and workers) as described in the Citus documentation .
-     Follow the JChem PostgreSQL Cartridge Manual to set up the cartridge on each node:
-     Put your valid ChemAxon license file to the /etc/chemaxon/ folder on each node. 
 
-     Initialize the jchem-psql service on each node withsudo service jchem-psql init
-     Start the jchem-psql service on each nodesudo service jchem-psql start
-     Configure workers on the master nodeCreate the file pg_worker_list.conf in the master node’s PostgreSQL data directory (the directory declared in the postgresql.conf file as data_directory) and add the worker’s hostname and PostgreSQL port setups to this file, like: 
 worker-host15432worker-host25421(more workers)
Examples of usage
-     Check worker nodesSELECT*FROMmaster_get_active_worker_nodes();
-     Create distributed table with an ID and a molecule columnIn the example below the table is created using hash distribution, it has four shards and has one replica for each shard. 
 CREATETABLEmol_table(idint, mol molecule('sample'));SELECTmaster_create_distributed_table('mol_table','id','hash');SELECTmaster_create_worker_shards('mol_table', 4, 1);
-     Prepare CSV format SMILES file with ID before each molecule from ordinary SMILES fileIn a command line shell: 
 cat -n nci-pubchem_1m_unique.smiles | sed -e's/^[ \t]*//'| sed -e's/^[0-9]*/&,/'| sed -e's/[ \t]*//g'> nci1m_with_id.smiles
-     Fill the table with dataThe script below cuts the original CSV file to 64 pieces and imports them to the created table parallely. This script has to be executed from command line as postgres user. 
 split -n l/64nci1m_with_id.smiles chunks/find chunks/ -type f | xargs -n1-P64sh -c'echo $0 `copy_to_distributed_table -C $0 mol_table`'The Citus 5.1 documentation states that the PostgreSQL COPY command is also supported and inserts rows into tables parallely, but it failed in our tests. 
 
 
-     Use the distributed table as any ordinary table for search, for example:CREATEINDEXmol_table_indexONmol_table USING chemindex(mol);SELECTidfrommol_tableWHERE'c1ccccc1N'|<| mol;
Experienced limitations of Citus Community version
    
These limitations are not invoked by JChem PostgreSQL Cartridge.
     
    
-     Import can be done only with a limited set of PostgreSQL methods. Only a single insert can be performed using SQL , bulk insert can be performed with a command-line tool or COPY , described here , which has a much better performance. 
-     No subselects are allowed in a modification statement (e.g. insert, delete, update). For example INSERT INTO table2 SELECT * FROM table1 WHERE 'C' |<| mol is not supported. 
-     Only distributed tables can be joint in one SELECT statement. A distributed and non-distributed table join is not supported. 
-     Only the postgres user can have distributed tables. 
-     Distributed tables can not be renamed. 
-     In Citus version 5.0 explain plans are not available, but they are already available in version 5.1. 
-     Since the ChemAxon PostgreSQL Cartridge does not contain an equality operator for Molecule type, tables can not be distributed by hashing the Molecule type column. Tables containing molecules have to be sharded by another column.