ChemAxon SMILES Abbreviated Group

Abbreviated groups are stored in a TAB-delimited text file called default.abbrevgroup. The basic format is:

    Ac	CC=O	2
AcAc CC(=O)CC(=O) 5
Acet CC=O 2
Ade NC1=C2N=CNC2=NC=N1 6 1
J C*.CCC[C@H](N)C=O |r,m:1:3.4| 7 8

Please make sure the words are separated by TAB characters not by spaces.

Codename: abbrevgroup

In these lines the very first word is the abbreviation, the second is the CXSMILES string representing the molecule fragment depicted by the abbreviation. These are followed by the attachment atom numbers (in the CXSMILES string). In the first line using the Ac abbreviation the second carbon is the attachment atom so if we have a connection to an other molecule part then this atom will make the connection. If there is no number following the the CXSMILES string the abbreviated group can not be connected to other atoms. (However their number is not limited too since Marvin 6.0.)

Usually the bond points towards the middle of the abbreviation but when the string contains atom symbols, probably we want to make it point to the symbol of the bonding atom. Furthermore it is desirable to flip the abbreviation when the group is in the opposite side:

images/download/thumbnails/48674807/abbrev_1.png

To achieve the flipping effect one have to provide the alternative name of the abbreviated group that will be printed on the left side of the molecule:

	CN	C#N	1	leftName=NC
CO2Et CCOC=O 4 leftName=EtO2C
CO2H OC=O 2 leftName=HO2C
COOH OC=O 2 leftName=HOOC
COOiAm CC(C)CCOC=O 7 leftName=iAmOOC

If the abbreviation contains numbers, those will be treated as subscripts:

	C10H21	CCCCCCCCCC	1	leftName=H21C10
CBr3 BrC(Br)Br 2 leftName=Br3C

images/download/thumbnails/48674807/abbrev_2.png

Additionally there can be groups where it is good to have flipping abbreviations but the string represents the form that is used on the left side. For these groups (for example AcO, MeO) the rightName specifier can be used:

	BnNH	NCC1=CC=CC=C1	1	rightName=HNBn
BnO OCC1=CC=CC=C1 1 rightName=OBn
BnO2C O=COCC1=CC=CC=C1 2 rightName=CO2Bn
BnOOC O=COCC1=CC=CC=C1 2 rightName=COOBn

images/download/thumbnails/48674807/abbrev_3.png

If you do not want to flip an abbreviation but want to be sure that the bond points to an atom symbol and not to the middle of the string, you still can define the center specifier:

	c-C10H19	C1CCCCCCCCC1	1	center=AUTO
c-C11H21 C1CCCCCCCCCC1 1 center=AUTO
c-C12H23 C1CCCCCCCCCCC1 1 center=AUTO

images/download/thumbnails/48674807/abbrev_4.png

This option allows to point to the very first character in the abbreviated group string that is the same as the atom symbol of the binding atom. This option makes it possible to fine-tune the position of the bond to point to any of the characters.

Atom properties of the atoms inside the are stored int the CxSmiles description part of the molecule. But properties of the abbreviation atom can not be stored there: they are stored in a separate field with the same syntax as in the CxSmiles format. In the example below the abbreviation atom of Alanine contains two properties with keys 'property 1' and 'property 2' and values 'value 1' and 'value 2' correspondingly. Properties are separated by the character ':'. Characters '.' and ':' in property keys and values are escaped.

        Ala    C[C@H](N)C=O |r|    3    4    abbrevAtomProperties=property 1 .value 1:property 2.value 2

images/download/thumbnails/48674807/AlaWithProperty.png

Extending the built-in abbreviated groups

From the 5.10 release extension possibilities have been introduced for the default built-in abbreviated groups.

A user can define abbreviated groups to be used in MarvinSketch in a file called user.abbrevgroup. This file has to be placed in the ChemAxon settings directory that is located in the chemaxon or .chemaxon folder inside the home directory of the user.

For developers who are developing based on the MarvinBeans library, or who are using the Marvin Applets package, it is possible to extend the abbreviated group list via files that can be defined for usage in MarvinSketch with the help of the customAbbrevGroups parameter.