Search

The web client application includes a Search page that enables locating entries already stored in the registry database using a set of filtering options. For example, a very general search can be chosen that scans the entire database, or a very specific one using the combination of the available filters. The search service also supports the use of chemical structures as queries.

This chapter gives you a walkthrough of the usage of the Search page:

Search Page

Clicking on the Search tab in the menu bar the Search page appears in a new window. The Search page is divided in three sections:

The columns displayed on the Search query table can be customized. E.g. beside the structure, the salt info (salt/solvate name and multiplicity), the PCN, CN, LN, LnbRef, Project, CST, MW (structure), MW (structure+salt), Formula, lot level additional data (Purity or QC), the Match type and Action can be displayed or hidden.

The Action on Search page it is actually a link to the Amendment page, where the lot from the search results can be viewed (to eventually make some modifications with it).

If no criteria is set (no checkbox is selected at the search options, no search query is set, the Search type is the default value Exact) all the preparation (lot) level structures are returned as a result. For more information about how to display the results please consult the Search results subsection.

Search Options

Search options are to be selected if the specified search conditions contain structural criteria. Structure search within the registry database is provided by JChem Base. To read more about the JChem Base search options please follow the link below.
https://www.chemaxon.com/jchem/doc/user/queryindex.html

2D and Tautomer Match

The match of your choice can be selected by using the corresponding checkboxes.

  • 2D - The search for structures is initiated disregarding the stereochemistry (tetrahedral stereo, double bond stereo, etc.) of the query and target structures. E.g. searching for (2R)-2-methyloxane and checking the 2D search option will return results also containing (2S)-2-methyloxane and 2-methyloxane structures ( figure Search 1 ).

images/download/attachments/43908152/Search1.png

Figure Search 1. Different search results for 2D match Search option

Currently the matches involving CSTs (although are considered 2D matches during submission or amendment), are displayed as 'Exact' match types on the Search page because no chemical stereo information is disregarded. Therefore those hits are being returned ( figure Search 2 ) also after an Exact search (when the 2D Search option is not selected).

images/download/attachments/43908152/92-Search.png

Figure Search 2. Search results for 2D matches including CSTs

  • Tautomer - Tautomer searches include lactim-lactam, keto-enol, imine-enamine, and other common proton-shift tautomerism cases.

  • 2D and Tautomer - If both checkboxes are selected, then the initiated search will be the combination of the above cases. The match type column will inform the user if the actual hit is an 'Exact', '2D', 'Tautomer' or 'Tautomer & 2D' match in a structural point of view ( figure Search 3 ).

images/download/attachments/43908152/93-Search.png

Figure Search 3. Search results for 2D and Tautomer matches

Type of Search

Three types of structure search are currently enabled which can be selected from the drop-down list of the Search option pane.
Search types are:

  • Exact search

  • Substructure search

  • Similarity search

Exact search

  • Also referred as duplicate search in JChem:

https://www.chemaxon.com/jchem/doc/user/query_searchtypes.html#otherSearchTypes

  • Search for structures identical to the query structure including stereochemistry, isotopic and charged forms. The structure is considered as a complete entity, with all the structure's atoms and bonds identical in the retrieved compound. The stereo and tautomerism information can certainly be neglected using the checkboxes, as described above. E.g. let's assume that pyrrolidine exists in the database as 3 preparation within 3 different versions. If an exact search for pyrrolidine structure is started two hits will be returned (see figure Search 4 ). The 2 hits will be those 2 preparations that have the same version level structure excluding the salt/solvate information. The third preparation that has an isotopic version level structure will not be returned in this case.

images/download/attachments/43908152/94-Search.png

Figure Search 4. Exact search resulting single compounds using a structure as a query

Substructure search

  • Referred as substructure search in JChem Base:

https://www.chemaxon.com/jchem/doc/user/query_searchtypes.html#sub

  • Search for structures in which the query structure is embedded. For single compounds all versions including those ones that contain different isotopic or charged states, and also the salt/solvate containing versions are listed. E.g. using the above example when pyrrolidine exists in the DB as three lots, in case of a substructure search for pyrrolidine structure all three lots which belongs to three different versions will be found as results (see figure Search 5 ). For the returned three results the match type (see the Match type column) is Exact (even though a substructure search and not an exact search was performed), since all three preparations which belong to the same parent are exact matches with the search query structure.

images/download/attachments/43908152/95-Search.png

Figure Search 5. Substructure search (only single compounds are resulted)

As a substructure search result multi-component compounds are also returned if the query structure is a component of the multi-component compound. E.g. when running a substructure search for the piperazine structure, a mixture's, an alternate's and a single's lot is returned as result (see figure Search 6 ).

images/download/attachments/43908152/96-Search.png

Figure Search 6. Substructure search (single and multi-component compounds are resulted)

If the component structure is present in the DB without any single-component lot, only the multi-component compound is listed in the search results. E.g. using 4-aminocyclohexan-1-ol as a search query for substructure search, only an alternate is returned as a result if 4-aminocyclohexan-1-ol is not present as a lot in the DB.
When searching for CST only records (compounds having no chemical structure, just CST) on the Search page (using an Exact or Substructure search) put a star atom in the Marvin structure editor. To set the star atom, open the Periodic table of Elements in Marvin, Advanced tab, choose the star atom from the Special Nods, then put it on the canvas. If the structure editor contains a star atom, the CST only records are listed (currently only for the single compounds). In the below example ( figure Search 7 ) the listed result * "678" record is registered as a single compound in the DB without a proper chemical structure, with "678" as CST.

images/download/attachments/43908152/97-Search.png

Figure Search 7. Exact search resulting CST only single compounds using a star atom as a structure query

Similarity search

  • Referred as similarity search in JChem Base: https://www.chemaxon.com/jchem/doc/user/query_similarity.html

  • Search for structures which are "similar" to the query structure. The distance of the query and target structures is calculated based on the generated chemical hashed fingerprints in the JChem table (also used for the screening part of the substructure search). The metric is currently set to be the default Tanimoto. The similarity threshold can be provided when selecting this search type, a decimal value between 0 and 1 is accepted. 2D and Tautomer search option checkboxes are disabled in these cases.

  • As a result of the search all the structures are returned that have a similarity higher than the specified threshold. By default the hits are listed in an order of decreasing similarity (most similar on the top). In the Match Type column in this case "Similarity" is displayed followed by the metric and the similarity level of the actual entry, e.g.: "Similarity (Tanimoto) 0.65". See figure Search 8 . Please be aware that the displayed value is a similarity level (and not dissimilarity), thus 1.0 stands for a structure that is identical to the query.

images/download/attachments/43908152/98-Search.png

Figure Search 8. Results of similarity search for a query structure of piperidine.

Search Query

The result table on the consists of several items including structure, salt/solvate info, PCN, CN, LN, LnbRef, Project, CST, MolWeight (structure), MolWeight (structure + salt), Formula, Stereochemistry, Geometric isomerism, Created on, Submitter, additional configurable fields (like Purity and QC), Match Type and Action. Except for the salt/solvate info and Action, all other fields are searchable. Clicking on the structure field the embedded structure editor (Marvin Sketch by default) will open. Selecting Close the structure is loaded in the Structure panel.
The salt/solvate info and Action are active only for displaying the characteristics of the resulted lots. E.g. it is not possible to search for compounds having a given salt/solvate info, but for a resulted lot, if it contains any, the salt/solvate info will be displayed in the Search results table salt info column ( figure Search 5 ). It is also possible to search for CSTs in which case the CST of the compound will be present in the CST column of the Search results.

Informations about the searchable fields:

  • Above each searchable field a drop-down menu is located where logical operators can be set: "Ignore"," =", "<", ">", "<>", "Like", "Not like". The default operator is "Ignore".

  • To extend your search criteria even more, you can type an "%" for multiple missing characters and "_" (underscore) a for single missing characters to help narrow your search.

  • When searching for PCNs, CNs, LNs, LnbRefs and Project characters are considered case-insensitive.

Regardless if Project based access is enabled or not, Projects can be searched. E.g. when searching for Project = LLA, three results are returned. The first one having "LLa", the second one having "LLA", the third one having "lla" as Project (see figure Search 9 ).

images/download/attachments/43908152/99-Search.png

Figure Search 9. Search considering the Project field.

  • When searching for CSTs, characters are considered case-sensitive.

  • Formula field can also be used to search for exact molecular formula ("=" is selected).

  • By default, for exact molecular mass (MolWeight) search (when "=" is selected) a 3 digit decimal positive integer is considered.

  • MolWeight is 0 for compounds containing CST only substructures and multi-component compounds with undefined composition (alternates or mixtures with unknown ranges).

  • It is possible to search by MolWeight (structure) and also by MolWeigh (structure + salt/solvate). Molweight (structure) refers to the molecular weight of the preparation structure (neutral, charged or isotopic), whereas the Molweight (structure + salt) refers to the molecular weight of the preparation structure together with the salt or solvate which belongs to the version/lot. When a value is entered in the MolWeight (structure) field, the search is performed according to the MolWeight of the version/lot without salt/solvent. When a value is entered in the MolWeight (structure + salt/solvate) field, the search is performed according to the MolWeight of version/lot with salt/solvent. In both cases, for the displayed results, the MolWeight (structure) and MolWeight (structure + salt/solvate) columns will be populated.

E.g. having benzoic acid [MolWeight (structure)=122.1213)] registered as three lots and versions: benzoic acid with 3xHCl, isotopic (1-13C)benzoic acid and benzoate 1xNa+, if we search for MF=C7H6O2, only the first version: benzoic acid with 3xHCl will be returned as result (salt info is not considered). See figure Search 10 .

images/download/attachments/43908152/910-Search.png

Figure Search 10. Exact search using only the Formula field

Following the above example, if we initiate a substructure search using benzaldehyde as structure and the MolWeight (structure) > 121, all three lots (from different versions) are returned as search results ( figure Search 11 ). But, if using MolWeight (structure) > 122, only the first two versions are returned as results, since benzoate has an only 121.1134 for MolWeight (structure) and the salt (Na+ with multiplicity 1) is not considered. If a MolWeight (structure + salt/solvate) > 122 is initiated, again all three lots (from different versions) are returned as search results.

images/download/attachments/43908152/911-Search.png

Figure Search 11. Substructure search using the structure and the MW (structure) fields

  • The user-supplied MolWeight of a lot can be searched only from the MolWeight (structure + salt/solvate) field. E.g. cyclohexane-1,4-diol (having a user-supplied MolWeight) cannot be searched using the MolWeight (structure) = 200. The search will return result only if the MolWeight (structure + salt/solvate) is set to 200. In this case the MolWeight (structure) column will be empty ( figure Search 12 ).

  • In a case when in a tree only the parent has a user-supplied MolWeight, and this is searched, the search will give no results, since the lot from the tree has a different MolWeight than the parent.

  • For those structures for which extra data stored on lot level are available (see figure Search 12), it is also possible to search by the extra fields (additional data like Purity or QC).

images/download/attachments/43908152/912-Search.png

Figure Search 12. Exact search using only the MW (structure + salt/solvate) field when searching for a user-supplied MW.

If searching terms are entered in multiple fields (with or without using the structure field), the fields are automatically combined. E.g. running a substructure search for 1H-pyrrole as structure, CST like %lab% and MolWeight (structure) > 80, then 3,4-dimethyl-1H-pyrrole "Lab78" will be shown as result ( figure Search 13 )

images/download/attachments/43908152/913-Search.png

Figure Search 13. Substructure search using multiple fields

The selected conditions in the Search options and Search query pane will remain in effect during your current search session until you change or delete them.

Search Results (Display, Paging)

In the Search results table all lots that match the search criteria are displayed. On completion of a query, a table of hits (having corresponding fields to the Search query panel) is displayed in which the list of the lots is in order of the registration date. As default only the first five lots are listed and the number of the listed items is displayed above the table.

For versions and parents for which synonyms are available (the PCNs and CNs are displayed in red), the synonyms appear when hovering over the PCN or CN.
Clicking on the structure in structure column (left) opens the compound in a new window containing the fused image and the LnbRef. The fused images of restricted lots are displayed in red frame.

At bottom of the page the number of hits to be displayed can be selected using the drop-down menu (5/10/20/40), and the show all results as well can be selected. The arrows allow navigating in the pages of the table. To open a submission in the Amendment page click the [Amend] button on the right side of the hit.
In the upper left corner the [Export Results] button enables to save the retrieved records as a data file (.sdf file). For more details consult the Data export to an SD file chapter.