Sophisticated Chemical Formula Search
Introduction
JChem Base offers sophisticated chemical formula search for easier usage and better performance. Besides simple chemical formula search, finding isotopes, polymers, multicomponent and non-stoichiometric formulas is also available in JChem Base. The formula search method uses JChem Base's cd_formula column which contains the string formula of the molecule, but it is also possible to use any other custom user column that contain valid chemical formulas.
Formula search features
The supported chemical formula features are described below.
-
Case-insensitive notation of formulas
In case of ambiguous symbols please use the appropriate chemical symbol of the element. Upper case letter defines that a new chemical element is starting. Spaces are ignored between different chemical symbols and numbers both in the query and in the target formula. E.g.: Al2O3 = Al 2 O 3 = Al2 O3
It is useful to separate ambiguous atom symbols with spaces, e.g.: silicon = Si ≠ S I, NaLi = Na Li ≠ N Al I = NAlI.
Accepted forms are e.g.: C7H14O ; c7h14o ; c7H14o ; Si or si for silicon; SI or sI for sulfur and iodine.
-
Hill notation is not required
Any order of atoms is accepted, e.g. C7H14O , H14C7O , OC7H14 . A chemical symbol can appear in the formula multiple times, e.g.: CH3COOH = C2H4O2 , both are accepted.
-
Parentheses
Parentheses are accepted in case of repeating units and groups. The number after parentheses multiplies each atom in that group. Accepted form is e.g: (C2H4)7O , same as C14H28O .-
Any letter (lower or upper case) after brackets defines a polymer molecule. (C2H4O)n is a polymer molecule with C2H4O repeating unit.
-
Combinatorical groups can also be defined by bracketing the conditions. (Br+F+I)5 represents that the sum of bromine, fluorine and iodine atoms in a molecule is 5 (i.e. the molecule contains 4 bromine, 0 fluorine and 1 iodine atom or 1 bromine, 2 fluorine and 2 iodine atoms, etc.). This type of formula presentation is valid only on the query side of formula search.
-
Nested parentheses are not supported.
-
-
Defining intervals
Both open and closed intervals are interpreted to set minimum and/or maximum number of type of atoms and groups.
As a formula e.g.: C-7 H10-14 O0- signifies molecules with maximum 7 carbon atoms, minimum 10 and maximum 14 hydrogen atoms and any number of oxygen atoms both on query and target side.
Query form e.g.: (CH2)5-7 O0- N-3 signifies molecules with carbon atoms between 5 and 7, hydrogen atoms between 10 and 14, any number of oxygen atom and maximum 3 nitrogen atoms.
-
Multicomponent search
The components should be separated with a period ( . ) in multicomponent search. The sequence of the components is not important. The coefficient of each component can be set by the appropriate numbers, fractions or intervals. As a special index 'x' signs any number of the indicated component in this formula. An omitted coefficient defaults to 1.
Accepted forms are e.g: 5C4H6.Na , 3/4 Na . 2-5 C4H6 , xCuCl2.xH2O means any number of CuCl2 with any number of H2O . (C2H4O.C8H9)n defines a copolymer with C2H4O and C8H9 repeating units.
-
Isotopes
Isotope search is available using square brackets in the search formula. Accepted form is [mass number followed by chemical symbol], e.g.: C7H12 [2H][3H] O
Trivial abbreviations ([2H] = D; [3H] = T) are also accepted without square brackets! Example: C7H12 DT O
Other important aspects
-
Excluded atoms can be specified by typing 0 after its symbol in all three search types (see Search Types). e.g.: C7H14ON0 signifies molecule with 7 carbon atoms, 14 hydrogen atoms, 1 oxygen atom and NO nitrogen.
-
When no exact number is specified for an atom it is handled as 1, e.g.: C7H14O = 'C7H14O1'.
See more examples of acceptable formulas in JChem Base.
Search types
To fulfil every requirement, three search types are available to find molecules by chemical formula: Exact , Exact subformula and Subformula search.
Exact search
-
The result list contains molecular formulas equal to the given search criteria, atoms with differing numbers and other atom types are not allowed to be present.
Exact subformula search
-
The result list contains molecular formulas equal to the given search criteria, atoms with differing numbers will not, but other atom types may be present. E.g. query formula C6 H6 O6 matches C6 H6 O6 S but does not match C4 H6 O6
Subformula search
-
The result list contains molecular formulas matching at least the given search criteria but higher number of atoms and other atom types may also be present. E.g. query formula C6 H6 O2 matches C6 H12 O6, C6 H6 O2 S but does not match C4 H6 O6 or C2 H6 O N . According to search rules, in subformula search the query formula can match polymer formula as well (see Table 2.).
Table 1. Search type results comparison
Query Formula |
C7 H14 O |
|||
Result |
Yes |
No |
||
Exact |
C7H14O |
C7H14O |
C6H14O |
C7H14OS |
Exact Subformula |
C7H14OS |
C7H14OSi |
C6H14O |
C7H16OSi |
Subformula |
C7H16OSi |
C8H14O2S |
C6H14O |
C7H14S |
Table 2. Query formula can match polymer in Subformula search
Query Formula |
C4 H6 O2 |
||
Search type |
Exact |
Exact subformula |
Subformula |
Target formula |
(C4H8O4)n |
(C4H8O4)n |
(C4H8O4)n |
Find |
No |
No |
Yes |
Sophisticated Chemical Formula Search Examples
Here you can find some examples on accepted query formulas in JChemBase.
Formula Examples
|
Spaces can divide the formula at any logical point. |
Even mixed upper case and lower case letters are accepted for chemical symbols. |
Any order of the elements is accepted. |
The groups in parentheses are multiplied. |
Any element can be excluded by zero. |
C9H21O5PSi
|
C9H21O5PSi |
c9h21o5psi |
C9 O5 H21 Si P |
(CH3)3 (CH2)6 O5 Si P |
C9 O5 H21 Si P N0 |
C9 H21 O5 P Si |
c9 H21 O5 p Si |
Si P O5 H21 C9 |
(CH2)9 H3 O5 Si P |
Si P O5 H21 C9 S0 |
|
C 9 H 21 O 5 P Si |
c 9 h 21 o 5 p si |
H21 Si O5 P C9 |
(CH4O)5 C4 H Si P |
H21 Si O5 P C9 Cl0 |
Parentheses usage
works as a mathematical logic |
defines polymer molecule |
specifies combined groups |
(CH3)3 Si C3H5 |
(C8H8)n |
(F+Cl+Br+I)1 Sum of F, Cl, Br and I is equal to 1, i.e. there is only 1 halogen atom in the molecule. |
C3H9 Si C3H5 |
(C8H8)L |
(Cl+Br+I) Sum of Cl, Br and I is equal to 5 in a molecule . |
C6H14 Si |
polystyrene |
(F+Cl+Br+I)0 No halogens are allowed in the molecule. |
Intervals
Open intervals
-
0- : from zero to infinite (none and any)
-
4- : the number of the signed element is greater than or equal to 4.
-
-4 : the number of the signed element can be none, 1, 2, 3, or 4.
Closed intervals
-
3-8 : the number of the signed element is greater than or equal to 3 and less than or equal to 8.
For example: (CH2)5-7 O0- N-3 signifies molecules with carbon atoms between 5 and 7, hydrogen atoms between 10 and 14, any number of oxygen atom and maximum 3 nitrogen atoms.
Interval Example:
Query formula |
Results in Exact formula search |
(CH2)5-7 O0- N-3 |
C5H10N3 |
C5H11N3 |
|
C5H12N3 |
|
C5H13N3 |
|
C5H14N3 |
|
C6H10N3 |
|
C6H11N3 |
|
C6H12N3 |
|
C6H13N3, ... |