Molecule file conversion with Molconverter
Molconverter is a command line program in Marvin Beans and JChem that converts between various file types.
Usage
molconvert [options] outformat[:exportoptions] [files...]
The outformat argument must be the codename of one of the supported formats. That can be found in the page of each file format. Some example:
Format type |
Codename of the format |
Document formats |
|
Molecule file formats |
mol , rgf , sdf , rdf , csmol, csrgf, cssdf, csrdf , |
Graphics formats |
|
Compression and Encoding |
molconvert [options] query-encoding [files...]
to query the automatically detected encodings of the specified molecule files.
From files having doc, docx, ppt, pptx, xls, xls, odt, pdf, xml, html or txt format, Molconvert is able to recognize the name of compounds and convert it to any of the above mentioned output formats.
Options
-o file |
Write output to specified file instead of standard output |
-m |
Produce multiple output files |
-e charset |
Set the input character encoding. The encoding must be supported by Java. |
-e [in ]..[ out] |
Set the input (in) and/or output (out) character encodings. Examples: UTF-8, ASCII, Cp1250 (Windows Eastern European), Cp1252 (Windows Latin 1), ms932 (Windows Japanese). |
-s string |
Read molecule from specified SMILES, SMARTS or peptide string (try to recognize its format) |
-s string { format : options } |
Read molecule from the string in the specified format (can be omitted), using the specified importoptions (can be omitted) |
-f <string> |
Specify the import format and options |
--smiles string |
Read molecule from specified SMILES string |
--smarts string |
Read molecule from specified SMARTS string |
--peptide string |
Read molecule from specified peptide string |
-g |
Continue with next molecule on error (default: exit on error) |
-Y |
Remove explicit H atoms |
-I <range> |
process input molecules with molecule index (1-based) falling into the specified range (e.g. 5-8,15 refers to molecules 5,6,7,8,15) |
-U |
fuse input molecules and output the union |
-R <file>[:<range>] |
fuse fragments to input molecule(s) from file with specified mol index range range syntax: "-5,10-20,25,26,38-" (e.g. -R frags.mrv:20-) |
-R<i> <file>[:<range>] |
fuse R<i> definition members to input molecule(s) from file in specified index range (e.g. -R1 rdef1.mrv:5-8,19) |
-R<i>:<1|2> <file>[:<range>] |
fuse R<i> definition members to input molecule(s) from file in specified index range, filter molecules having 1 (2, resp.) attachment points (e.g. -R1:2 rdef1.mrv:-3,8-10) |
-F |
Remove small fragments, keep the largest |
-c"f1 OP value&f2 OP value..." |
Filtering by the values of fields in the case of SDF import. |
--mol-fields-to-records |
Convert molecule type fields to separate records. |
-v |
Verbose |
-vv |
Very verbose (print stack trace at error) |
-2 [ : options] [ : F<i1><i2>...,<iN>] |
Calculate 2D coordinates Options for coordinate calculation. |
-3 [ : options] |
Calculate 3D coordinates |
-H3D |
Help on options for 3D calculations. Detailed list on Clean 3d Options |
Import options can be specified between braces, in one of the following forms:
filename{options} |
|
filename{MULTISET,options} |
to merge molecules into one that contains multiple atom sets |
filename{format:} |
to skip automatic format recognition |
filename{format:options} |
|
filename{format:MULTISET,options} |
|
You can also pass options to JAVA VM when you run the application from command line.
Export options:
a, +a, +a_gen |
General aromatization. Example: "XXX:a" |
a_bas |
Basic aromatization. Example: "XXX:a_bas" |
a_loose |
Loose aromatization. Example: "XXX:a_loose" |
a_ambig |
Ambiguous aromatization. Example: "XXX:a_ambig" |
-a, -a_gen |
General Dearomatization. Example: "XXX:-a" |
-a_huckel |
Huckel dearomatization. Example: "XXX:-a_huckel" |
-a_huckel_ex |
Huckel dearomatization, throwing exception in case of failure. Example: "XXX:-a_huckel_ex" |
H or +H |
Add explicit Hydrogen atoms. Example: "XXX:H" |
-H |
Remove explicit Hydrogen atoms. Example: "XXX:-H" |
Here, XXX can be any molecule or image format like mrv, mol, smiles, cxsmiles, abbrevgroup, cml, jpeg, png or svg, but aromatization options have no effect on formats which do not store bond orders like cube, pdb and xyz.
mol:v2 |
for exporting position variation bond to MDL mol v2000 |
Examples
-
Printing the SMILES string of a molecule in a molfile
molconvert smiles caffeine.mol
-
Dearomatizing an aromatic molecule:
molconvert smiles:-a -s
"c1ccccc1"
-
Aromatizing a molecule:
molconvert smiles:a -s
"C1=CC=CC=C1"
(The default general aromatization is used.)
-
Aromatizing a molecule using the basic algorithm:
molconvert smiles:a_bas -s
"CN1C=NC2=C1C(=O)N(C)C(=O)N2C"
-
Converting a SMILES file to MDL Molfile
molconvert mol caffeine.smiles -o caffeine.mol
-
Making an SDF from molfiles:
molconvert sdf *.mol -o molecules.sdf
-
Printing the encodings of SDfiles in the working directory:
molconvert query-encoding *.sdf
-
SMILES to Molfile with optimized 2D coordinate calculation, converting double bonds with unspecified cis/trans to "either"
molconvert -
2
:2e mol caffeine.smiles -o caffeine.mol
-
2D coordinate calculation with optimization and fixed atom coordinates for atoms 1, 5, 6:
molconvert -
2
:
2
:F1,
5
,
6
mol caffeine.mol
-
Import a file as XYZ, do not try to recognize the file format:
molconvert smiles
"foo.xyz{xyz:}"
Note: This is just an example. XYZ and other formats known by Marvin are always recognized (send us a bug report otherwise), so the specification of the input format is usually not needed. It is only relevant if a user-defined import module is used.
-
Import a file as XYZ, with bond-length cut-off = 1.4, and max. number of Carbon connections = 4, export to SMILES:
molconvert smiles
"foo.xyz{f1.4C4}"
-
Import a file as Gzipped XYZ, with the same import options as in the previous example:
molconvert smiles
"foo.xyz.gz{gzip:xyz:f1.4C4}"
-
Like the previous example but merge the molecules into one molecule that contains multiple atom sets. MDL molfile is exported.
molconvert mol
"foo.xyz.gz{gzip:xyz:MULTISET,f1.4C4}"
-
Import an SDF and export a table containing selected molecules with columns: SMILES, ID, and logP:
molconvert smiles -c
"ID<=1000&logP>=-2&logP<=4"
-T ID:logP foo.sdf
-
Fuse R2 definition from file, filter fragments with 1 attachment point:
molconvert mrv in.mrv -R2:
1
rdef.mrv
-
Fuse fragments from file (note, that the input molecule, which the fragments are fused to, should also be specified):
molconvert mrv in.mrv -R frags.mrv
-
Generate all common names for a structure:
molconvert
"name:common,all"
-s tylenol
-
Generate the most popular common name for a structure (It fails if none is known.):
molconvert name:common -s viagra
-
Generate SMILES from those molecules that names are mentioned in a file foo.html:
molconvert smiles foo.html