Definitions of Terms

The list of terms with their definitions in the ChemAxon Compound Registration system:

Additional Data

Additional data are considered those data, which are stored only at lot level. The default data are: Purity and QC (editable on both Registration and Submission pages), but the list can be configured.

Alternate

The Alternate is an abstract representation of an uncertain structural information. An Alternate can be a list of possible chemical forms of a certain compound. The Alternate is a multi-component compound without quantitative composition information. For other multi-component types, see also formulation and mixture.

Amendment

Amendment is called the process of modifying a registered compound. On the Amendment page, beside the chemical structure (and CST), the Restriction value , the Molweight and the LnbRef can be modified.

Analyze Salt/Solvate

Analyze Salt/Solvate is a procedure capable of automated extraction of salt/solvate fragment from a compound's chemical structure and replace them with references to the corresponding records in the salts and solvates dictionary. It can be activated or deactivated as a part of the system switchers. The switcher is on by default for some sources, and can be applied manually to the records in the staging area.

Assigned/Unassigned Submission

An Assigned submission is one that is being worked on by a specific user (e.g. registrar) in order to be manually registered from the staging area. Multiple submissions can be assigned to the same registrar. Once a submission is assigned to a given user, it will be locked, so that any other user will receive a warning upon trying to open that submission. A submission, that is not assigned to any user, is considered Unassigned, and can be freely worked on by any user.

Audit

A detailed history of the changes that have been made to a compound during amendment steps performed on the parent, version or lot level.

Autoregistration

The process of calling the registration service to register a compound automatically, based on a predefined configurable set of business rules. In case a submission cannot be registered automatically, it will fall to the staging area and a user with corresponding privileges will need to review and manually register it.

Bulk Registration

The process of registering multiple compounds (possibly thousands of compounds) in one request. The registration service has the ability to perform both bulk autoregistration of records from an SD file (see Data import chapter) and bulk manual registration of submissions from the staging area. In either case, the registration process can be customized through to custom structure checkers, structure fixers and system switchers.

Chemically Significant Text (CST)

CST is a text, which is attached to the chemical structure, it is considered as part of the structure, and takes part in defining the structure uniqueness. CST can be attached either to a component, or to the whole structure. It is possible to register a record with CST without a chemical structure. In this case it is usually referred as "CST-only" or "No structure CST" record.
CSTs can be added to a dictionary. CSTs from the dictionary can be retrieved and used for compound registrations and amendments.

Chemist ID

The same as a Submitter ID.

Compound

The Compound is a proper representation of any chemical entity, including charges, isotopes, salts, solvates, etc. During the process of the registration, a parent compound is created for each Compound, that contains a so called parent standardized form of the chemical structure. Compounds and parent compounds in general are referred as structures.

Compound Number (CN)

The Compound Number (CN) is a unique identifier of the registered compound, which is either generated automatically by the registration system according to predefined rules, or it is inherited from an existing version in case of an exact compound match. It is also possible to specify the CN (just like the PCN) during the registration, which can be useful e.g. during migration of legacy data. The CN explicitly identifies a version. When it is generated, CN is usually derived from the parent compound number (PCN) according to customizable rules.

Dictionaries

Multiple Dictionaries can be added and populated to the Registry database, which can be used later during registration or amendment. Dictionaries, and also their items, can be searched, edited and deleted. By default, the Compound Registration includes five dictionaries: Chem. Sig. Text (empty), Double bond panel, Geometric Isomerism, Stereocenter panel and Stereochemistry, which contain some sample items.

The items of the Stereochemistry and Geometric isomerism dictionaries currently are present in drop-down lists only on the Registration page of the application. On the Submission page, depending on the "Calculate Stereo Comments" switcher setting, the comment can be either set manually to an arbitrary text, or it can be calculated using the items of the existing Stereochemistry and Geometric isomerism ( Stereo Comments ) dictionaries.

The content of the Stereocenter and Double Bond Panels are used on the Submission page for the Stereo Fixer panel.

External ID

External IDs are IDs derived from an external source. Currently there are two external ID's: LnbRef and Lot ID. The LnbRef is always mandatory, but Lot ID is optional.

File Format

The default file formats for structures are the MRV, MDL Extended Molfile V3000 (.mol) and SD File. For more details please consult file formats in Marvin:
or the original specification: http://download.accelrys.com/freeware/ctfile-formats/ctfile-formats.zip
It is possible to use any other molecule format that Marvin can import and export . Please note that only the MRV, MOLV3000 and CXN Extended SMILES store the enhanced stereo information, but the CXN Extended SMILES format cannot store data attached to atoms.

Formulation

A multi-component compound with exact quantitative composition information (e.g. component 1: 37%, component 2: 63%). A practically arbitrary number of components can be defined. All the component percentages should be positive and their sum should be equal to 100. See also alternate, mixture.

Fused Image

A structure image that is on-the-fly generated from the components of a structure. Fused images are generated for multi-component compounds on all hierarchy levels and for single-component structures with salts/solvates on version and lot level.

JChem Structure Table

A JChem Structure Table is a database table maintained by the ChemAxon JChem libraries that contains structural information. A JChem Table stores the proper representation(s) of the structure and a list of additional field (e.g. fingerprints) that supports the easy and fast screening/searching of the table by the available search types (duplicate, substructure, similarity, etc.) There are different table types based on the intended usage. For further details please visit JChem documentation.

LnbRef

Acronym for (Electronic) Laboratory NoteBook (LNB) Reference. The identifier is provided by the source prior to the registration. It is a compulsory data field for every submission and it is guaranteed to be unique in the whole registration database. The format of LnbRef can be customized by the company and is validated during the registration process. The LnbRef can be modified after the registration, but the attached lot ID cannot.

Locked/Unlocked Submission

See assigned/unassigned submission.

Lot

The bottom level of the data hierarchy. A Lot (preparation) represents the unit of material obtained in one definite chemical process. A Lot entry has external IDs like an LnbRef and a lot ID as unique identifiers. For Lot level configurable additional data can be stored (e.g. Purity or QC). Project informations are also stored at Lot level, but these will be also inherited by the version and parent too.


Lot ID

The Lot ID is an external ID attached to a lot. Lot ID is optional, but if it is required by the system, it cannot be modified.

Lot Number (LN)

The LN is a unique identifier attached to the lot, typically derived from the PCN (regardless of the fact, that the PCN is specified one or generated). When a lot is moved to another tree, the LN is regenerated. Similarly to the PCN and CN, this can also be configured.

Manual Registration

The process of registering a failed submission (or multiple failed submissions) by a user with corresponding privileges from the staging area. The result of the Manual Registration is driven by a set of structure checkers, structure fixers and system switchers. The user also has the opportunity to modify the structure manually for the given submission before re-submitting it to Manual Registration.

Match

A Match is an already registered parent structure, which could potentially serve as a parent for a compound, that is to be registered /amended. Several different types of Matches exist, based on the level of structural similarity: exact, 2D, component, etc. During autoregistration, depending on the configuration of the system switchers, any non-exact match type is either ignored or causes the submission to fall to the staging area. During manual registration and amendment, the user is presented with the available Matches. Then, he has the ability to choose a Match and a match action. Finally, if it couldn't be done automatically, the user might have to reconcile the Matched tree with the new compound through a process called version fix or version correction.

Match Action

The way to respond to matches during manual registration or amendment. There are 3 Match Actions:

Match Type

The way that the parent-standardized compound and its match are related.
For single component compounds the Match Type can be: exact, tautomer, 2D, 2D&tautomer and similar CST. The stereo isomers and/or CST matches are considered 2D matches. For more details about stereomers please consult the Documentation about stereochemistry.

For details related to tautomers please consult the Documentation about tautomers. CSTs are considered to be Similar if they have the same content except for the whitespaces and case sensitivity. E.g a "test" and T est" are Similar CST matches.

For multi-component compounds the Match Type can be exact, component, or external component. The Match Type is component match when two multi-component compounds have the same type and the same components, but with different ranges/percentages (e.g. a mixture to be registered consists of 21-44% benzene and 56-79% toluene, while another mixture is already in the registry consisting of 45-55% benzene and 45-55% toluene). The Match Type is external component match when two multi-component compounds have different types, but have the same components (e.g. a mixture to be registered consists of 21-44% benzene and 56-79% toluene, while there is a registered alternate consisting of benzene and toluene).

Mixture

A type of multi-component compound with semi-quantitative composition information. In case of a Mixture every component has an assigned range that represent the relative amount of the component (e.g. component 1 composes 30-40% of the mixture, while component 2 composes 60-70%). The maximum number of the components and the component range values can be configured independently. Some of them can also be used as unknown ranges, in case of uncertain information. When a Mixture has an unknown component range an additional 'UNKNOWN' data is also attached to the structure. See also formulation, alternate.

Molecular Formula (Formula, MF) and Molecular Weight (MolWeight, MW)

The Formula for a compound is generated according to the Hill system: the number of carbon atoms is indicated first, the number of Hydrogen atoms next, and then the number of all other chemical elements subsequently, in alphabetical order. Isotopes are listed separately in square brackets following the related chemical element. When the formula contains no carbon, all the elements, including hydrogen, are listed alphabetically.
In the Formula representation dots are used to separate the structure from the salt/solvate and components in multi-component compounds, e.g. a 21-44% benzene 56-79% toluene mixture will have "C6H6.C7H8" in the Molecular Formula field.
Average molecular mass is calculated from the standard atomic weights.
The molecular mass of a compound can be supplied also by the user (referred also as user-supplied MW).
When searching in the database two types of molecular weights can be distinguished. The molecular weight of a version/lot without the salt or solvate [MW (structure)] and the molecular weight of a version/lot with the salt or solvate [MW (structure + salt)].
For further information, please check Appendix A. Calculations.

Multi-Component Compound

A compound which is composed of two or more components. Regarding the actual technical solution these components also exist as independently-registered single-component compounds in the registration system. Different types of Multi-Component Compounds exist based on purpose and on the level of accuracy of the composition information: alternates, formulation and mixtures. A Multi-Component Compound is distinct from a structure having multiple fragments within a structure field that has been registered as a single compound (without registering each fragment individually).

Parent

The highest level of storage hierarchy of the registration system. A Parent in the registration service database represents a parent compound along with a set of additional information (e.g Stereo Comments are stored at Parent level, but these are inherited also by the versions  and lots  too) . It is referred by a unique identifier called parent compound number (PCN). Each Parent can have multiple versions, that represent the registered compounds that are grouped together having a common parent compound.

Parent compound

The Parent Compound belongs to the top level of the storage hierarchy of the registration system. It is derived from the compound structure through parent standardization, which include neutralization and salt/solvate/isotope removal by default, but can be customized according to the corporate business logic.

Parent Compound Number (PCN)

The Parent Compound Number (PCN) is a unique identifier of the registered compound, which is either generated automatically by the registration system according to predefined rules, or it is inherited from an existing parent in case of an exact or accepted match. PCNs can be also specified during registration, which can be useful e.g. during migration of legacy data.The PCN explicitly identifies a parent.

Preparation

A synonym of lot.

Project

Project is a simple textual data field attached to the lot level. It can typically be interpreted as a reference to a business project, within the lot was created.
Projects can be specified either during autoregistration, or when registering the submission from the staging area. Each lot can be a part of multiple Projects. The Project information is calculated on version and parent levels as the union of the Projects defined for the lots of the tree (or sub-tree).

Quality Checks

During the process of registration, a certain set of quality assurance rules/checks can be defined, as a list of structure checker and structure fixer pairs. Quality Checks are defined at the level of the entire registration service, and cannot be configured individually for a specific source, although there exists a source-dependent system switcher that controls whether the quality checks are run or not.

Registered by and Modified by

"Registered by" refers to the identifier of the privileged user who manually registers a submission in case it cannot be autoregistered. "Modified by" refers to the identifier of the privileged user, who amends the compound once it is registered.

Registrar

An advanced user of the registration service, typically responsible for manually registering failed submissions, amending registered compounds and administering the registry database.

Registration

The process of deciding on the uniqueness of new (small) molecules compared to the ones already stored in a database. The decisions are made according to predefined corporate business rules. The result of the registration process is a dedicated database, the registry, that is used to store the relevant structural and accompanying information.
A compound, that has been submitted for Registration, is first checked and processed by several configurable steps (see standardization, structure checkers, structure fixers and system switchers), that ensure that the compound is fit to be consistently introduced into the database. The compound is then placed into the appropriate parent tree in the database - either a unique (new tree created for this compound), or into a matched tree in case such a tree exists. The registration service aims to register a compound automatically (known as autoregistration) whenever it is possible. In case a compound cannot be automatically registered, a privileged user can manually register it.

Registration successful / Registration summary

Registration successful window is received after registering a compound from the Registration page (due to autoregistration). Registration summary window is received on the Submission page (due to manual registration).
The window can be configured to contain the PCN, CN, LN, LnbRef and Lot ID. Optionally, in this window, another button can be present, which using an ID parameter (e.g. LnbRef) can redirect the user to a specified URL (configurable).
The window is not received, when bulk registration or bulk loader is used. In case of bulk registration (from the Submission page) a "Bulk registration summary" window appears, containing the failed, the successfully registered and the "in progress" registrations. When registering using bulk loader, no message window appears about the successful and failed registrations (after the process is finished, we are redirected to the staging area).

Registry

The Registry is a database, where all the data related to the registered compounds are stored.

Restriction Level

A numeric value associated with a registered compound, which indicates the level of exclusivity or confidentiality of that compound. A compound with a Restriction Level of 0 is considered unrestricted, while any higher Restriction level makes the compound restricted. Restricted compounds are highlighted in the Match list, on the Submission and Amendment pages with watermark, and on the Search page with red frame. The registration system gives additional warnings and prevents certain behaviour in the case of registration, matching and amendment of restricted compounds.

Salts and Solvates

A set of chemical structures, that are stored in a list. Salts and Solvates can be added to any compound during registration or amendment. See also salt/solvate fragment.

Salt/Solvate ID

A unique identifier assigned to the salt/solvate entries in the list. Salts and solvates are stored in a common table therefore having a common sequence of IDs.

Salt/Solvate Fragment

A fragment of a compound's chemical structure that can be identified with a record from the salt/solvate list. See also the Analyze Salt/Solvate.

Source

The Source identifies the origin of the compound to be registered. The registration system can accept different configurable Sources e.g.: REGISTRAR, ELNB, BULKLOAD. Submissions arriving from a Source, which is not listed in the configuration file, will fall to the staging area with the error message: "Unknown source". The structure checkers can be configured differently for each Source.

Staging Area

The entries of failed submissions are collected in the Staging Area. It is a dedicated area for compounds to be verified for manual registration. The site is under the authority of privileged users, who can correct and register failed submissions manually during the registration process, while system switchers and structure checkers/fixers are enabled/disabled.

Standardization

The process of converting a chemical structure to a Standardized form - defined by certain predefined rules - used in the registration service database. There are two separate steps of Standardization: general and parent. General Standardization is run for all compounds, that are to be registered, and can consist of any kind of structure transformation as configured by the user. Parent Standardization consisting of neutralization and isotope removals is performed after general Standardization in order to create/find the appropriate parent compounds.

Stereo Comments

Currently we distinguish between two types of Stereo Comments: Stereochemistry and Geometric isomerism, which are included into the Dictionaries.

The default items in the Stereochemistry dictionary are: Achiral, Diastereomeric mixture, Racemic diastereomer with known relative stereochemistry, Racemic or presumed racemic, Single known enantiomer, Single unknown enantiomer, Single unknown enantiomer with known relative stereochemistry, Unequal mixture of enantiomers (please describe).

The default items in the Geometric isomerism dictionary are: E, Equal mixture of geometric isomers, Known isomer with E and Z double bonds (as drawn), None, Single unknown geometric isomer, Unequal mixture of geometric isomers (please describe), Unknown, Z.

Structure

The Structure term in the registration system refers to the chemical structure itself and a set of additional data (CST, unknown attached data) that are considered during the decision of the uniqueness of a compound. The union of compounds and parent compounds can be referred as Structures. We can distinguish single Structures and multi-component Structures.

Structure Checker

An automatic way to check for structural problems in compounds submitted for registration. The registration service comes with several default Structure Checkers, and users can define additional custom checkers based on their own requirements. Depending on the configuration of the registration service, a structure that has been flagged as problematic by a given Structure Checker, can either be prevented from being registered, or can be automatically corrected by an associated structure fixer.
For more information about ChemAxon's Structure Checkers please consult the Structure Checker Documentation .

Structure Checker Software

Structure Checker is an interactive tool to detect and fix structure related issues using JChem technology. It comes with numerous checkers and fixers to search and correct various structural issues. The correction process can be manual, completely automatic, or somewhere in between. Structure Checker can operate in batch and provide flags for problems which cannot be automatically corrected. The checking and fixing functionality can also be accessed from external Java code through the JChem API.

Structure Fixer

An automatic way to correct structural problems that have been found by an associated structure checker. Several Fixers can be associated to a given structure checker in order to provide different ways of dealing with a structural problem. During manual registration or bulk registration, the privileged user can choose which Fixer should be applied to a particular compound. The registration service comes with several default Structure Fixers, and users can define additional custom Fixers based on their own requirements.
For more information about ChemAxon's Structure Checkers please consult the Structure Checker Documentation .

Submission

Submission is a record of a successful, failed, or in-progress registration. A Submission comprises the information needed for a registration (such as a structure, a lot ID, LnbRef, etc.), a submission status, and additional meta-information (such as the time of registration). Failed and in-progress Submissions can be seen in the staging area.

Submission created by

The identifier of the user who submitted the actual physical lot for autoregistration. This is distinct from the ID of the user (Registered by) who might have to manually register the same submission in case it cannot be autoregistered, or who might make an amendment to the compound once it is registered (Modified by). Submission created by is also different from the submitter ID, which appears as "Submitter" under the different tabs of the application.

Submission ID

The Submission ID is an automatic identifier for a submission entry, that is generated in increasing numerical order with the increment of 1 during entering a record into the registration system.

Submission page

The Submission page is the page where a submission from the staging area is opened in order to register it manually. On the Submission page you can edit the structure, CST, LnbRef, Molweight, Restriction, Salts and Solvates and the Additional data. On the Submission page you can turn on or off system switchers and can apply structure checker/fixers.

Submission Type

The Submission Type describes for each submission, which service was used and in what kind of circumstances for creating the submission. The Submission Type can be e.g. AutoRegister, AutoRegisterBulk, ManualRegister, DeleteId, DeleteTree etc.

Submission Status

A status indicating whether a submission is successfully registered, is still "in progress", or has failed due to some reason (e.g. the LnbRef was invalid, or a non-exact match was found). If the submission ended up in the staging area, there is a detailed description about the reason of failure besides the Submission Status.

Submitter ID

The identifier of the chemist who actually owns the physical lot. This might be distinct from the ID of the user who might have to manually register (Registered by) the same submission in case it cannot be autoregistered, or who might make an amendment to the compound once it is registered (Modified by). The Submitter (ID) appears under different tabs of the application. The Submitter (ID) plays important role in Project based access, e.g. a user having "read_own" permissions in a certain project, will be able to read only those submissions which have the given username in the Submitter field.

Synonym

Alternative names can be available for Compound Numbers (PCNs and CNs) in the DB. For versions and parents for which synonyms are available (these PCNs and CNs are displayed in red), the synonyms appear when hovering over the PCN or CN on Amendment and Search pages. If a synonym is available for a parent, that will be displayed also in the Match list. It is also possible to use a synonym to find a compound on the Amendment page.

System Switchers

A set of options that can be switched to either yes or no in order to modify the registration process. Some examples include Perform Quality Checks and Analyze Salt Solvate Fragments. System Switchers are configured through source dependent configuration files at the level of the registration service, but can be additionally configured e.g. during an individual manual registration.

Tree

The Tree is a storage hierarchy of the parent with all versions and lots in the registration database. Each Tree has one parent, but can have any number of versions under that parent, and any number of lots/preparations under each version.

Twig Optimization

The automatic amendment of higher levels (parent or version levels) in the parent tree in case of version or lot level amendment. Twig Optimization can happen on version level when a parent has only one version or on lot level when a parent has only one version and only one lot. Without Twig Optimization, a new parent tree would be created, while the original tree would continue to exist without any lots.

Unknown ID / Unknown Attached Data

Unknown Attached Data and IDs are generated for multi-component compounds without any quantitative composition (alternates) or semi-quantitative composition (mixtures) that involves unknown ranges. Examples for Unknown Attached Data and ID are: "Alternate 1", "Alternate 2", "Mixture 1", "Mixture 2", etc. For each registered unique compound a new Unknown ID is set. In a similar way "Isomer 1", "Isomer 2",... IDs are set for chiral compounds with unknown configuration having e.g. an "OR1" stereo flag.

User ID

The User ID (=username) indicates the user who has submitted the record, registered a record or initiated the amendment in question.

Validation

Every registration and amendment step begins with a thorough check of the input data provided to the services. Input values are Validated against a predefined set or range of possible values, regular expressions, etc. The series of steps to be performed might be dependent on the company business rules. The uniqueness of the external IDs is also checked during the Validation procedure. If any of the defined Validation steps fails, the submission ends up in the staging area with the proper error message.

Version

A Version in the registration service database represents a compound along with a set of additional information. It is defined as the second level in the data hierarchy. Each Version is referred by a unique identifier called compound number (CN).

Version Correction / Fix

A process of reconciling existing versions within a matched parent tree with a new version created through manual registration or amendment. The registration system attempts to do this automatically, but in cases where an automatic Version Fix is not possible, the user is prompted to make these changes by hand before registration or amendment can be completed.