MDL MOLfiles, RGfiles, SDfiles, Rxnfiles, RDfiles formats

MOL V2000 files

  • Atom block:

    • x, y, z coordinates

    • atom type:

      • 1H, 2He, 3Li, ..., 103Lr,

      • atom list and exclusive list L,

      • "any" atoms A, Q, *,

      • lonely pair LP

      • R-Group, R or RN, where N > 0 integer. Before version 5.9, R, without number, was written as R#

    • charge

    • stereo care box

    • valence

    • atom-atom mapping (for reactions)

    • inversion/retention flag (for reactions)

Codename: mol

Extension: .mol

  • Bond block:

    • bond type: 1, 2, 3, aromatic, "any", "single or double", "single or aromatic", "double or aromatic", "hydrogen" or "coordinate" (import only)

    • bond stereo information: up or down

    • bond topology: ring or chain

  • Properties block:

    • M ALS - atom list and exclusive list

    • M APO - Rgroup attachment point

    • M CHG - charge

    • M RAD - radical

    • M ISO - isotope mass numbers

    • M RGP - Rgroup labels on root structure

    • M LOG - Rgroup logic

    • M LIN - link nodes

    • M SUB - substitution count query property (s)

    • M UNS - unsaturated atom query property (u)

    • M RBC - ring bond count query property (rb)

    • M STY - Sgroup type

    • M SST - Sgroup subtype

    • M SCN - Sgroup connectivity (head-to-head, head-to-tail or either/unknown)

    • M SAL - atoms that define the Sgroup

    • M SPA - multiple group parent atom list (paradigmatic repeating unit atoms)

    • M SBL - Sgroup's crossing bonds

    • M SMT - Sgroup label

    • M SPL - Sgroup parent list

    • M SDS EXP - Sgroup expansion

    • M SDT - Data sgroup field description

    • M SDD - Data sgroup display information

    • M SCD - Data sgroup data

    • M SED - Data sgroup data end of line

    • M SNC - Sgroup component numbers

    • M CRS - Sgroup correspondence

    • M SDI - display coordinates in each S-group bracket

    • M SBT - the displayed S-group bracket style

    • M SAP - the S-group attachment point information

    • M MRV SMA - SMARTS H, X, R, r, a, A properties (Marvin extension)

    • A - Atom alias

    • V - Atom value

Extended MOLfiles (V3000)

If the number of atoms or bonds exceeds 999, in case of reactions with Rgroups or when there is enhanced stereo in the molecule the extended format is used. In an extended MOLfile, the following properties and features are supported:

  • Atom block:

    • x, y, z coordinates

    • atom type:

      • 1H, 2He, 3Li, ..., 103Lr,

      • "any" atoms A, Q, *,

      • lonely pair LP

    • atom-atom mapping (for reactions)

    • inversion/retention flag (INVRET)

    • CHG - charge

    • RAD - radical

    • CFG - parity

    • VAL - valence

    • MASS - isotope mass number

    • HCOUNT - number of implicit hydrogens

    • STBOX - stereo care box

    • INVRET - inversion/retention flag

    • ATTCHPT - R-group attachment point

    • RGROUPS - R-groups that comprise this R# atom

    • SUBST - Substitution count query property (s)

    • UNSAT - Unsaturated atom query property (u)

    • RBCNT - Ring bond count query property
      Restriction: only one R-group can comprise an atom in Marvin

Codename: mol:V3

Extension: .mol

  • Bond block:

    • bond type: 1, 2, 3, aromatic, "any", "single or double", "single or aromatic", "double or aromatic"

    • CFG - bond stereo configuration: up or down

    • TOPO - bond topology: ring or chain

    • STBOX - stereo care box

  • LINKNODE - Link nodes.

  • Rgroup blocks with RLOGIC entries

  • Template block (import only)

Reaction files (V2000)

A reaction file consists of a REACTANT block, a PRODUCT block, and (optionally) an AGENT block. Reaction files containing reaction agents are non-standard.

Each block starts with 'Molecule or Reaction Identifier'. The form of a molecule identifier must be one of the following:

Codename: rxn

Extension: .rxn

$MFMT $MIREG N$MFMT $MEREG N$MIREG N$MEREG N.

Here $MFMT means that a molecule is given in a molfile format, $MIREG N is the internal and $MEREG N is the external registry number of the molecule. Similarly, the identifier has the following form,

$RFMT $RIREG N$RFMT $REREG N$RIREG N$REREG N.

Here $RFMT means that a reaction is given in a rxnfile format, $RIREG N is internal and $REREG N is the external registry number of the reaction.

A reaction agent is a molecule structure that does not take part in the chemical reaction, but is added to the reaction equation for informative purpose only. Agents are normally displayed graphically above the reaction arrow, added to the reaction file after the reactants and the products. The number of agents is displayed in the file header (after the number of reactants and the number of products) if it is non-zero. Reaction files containing agents are non-standard.

Extended reaction files (V3000)

This format is used automatically if a reaction includes Rgroups and/or the number of atoms or bonds exceeds 999. An extended reaction file consists of a REACTANT block, a PRODUCT block, (optionally) an AGENT block, and (optionally) RGROUP blocks.

Codename: rxn:V3

Extension: .rxn

SD Files

In SDfiles read by marvin, the name field is special, it overrides the molecule name specified in the molfile part.

Incompatibility note: The MDL definition declares the maximal line length for molecule properties in 200 characters. We ignore this limit.

Codename: .sdf

Extension: .sdf

RG Files

A special feature of Marvin RGfiles is that they can contain a reaction as the root structure. This feature is non-standard, such mixed RG/Rxnfiles can only be imported by Marvin.

Codename: rgf

Extension: .rgf