ChemCurator User's Guide

ChemCurator Application

images/download/thumbnails/41129806/CUR.png ChemCurator is a desktop application of ChemAxon for computer-aided chemical information extraction. ChemCurator is a standalone desktop application. Running this application, you need to download and run ChemCurator installer. Some short video tutorials demonstrating the main functionality are available here.

images/download/attachments/41129806/image2016-1-19_11_16_52.png

Main Menu and Toolbar

The main menu contains "File", "View", "Window" and "Help" elements.

images/download/attachments/41129806/image2016-1-19_11_19_47.png

  • File

    • New Project... images/download/attachments/41129806/image2014-8-12_13_58_8.png (Ctrl+Shift+N)

    • Open Project... images/download/thumbnails/41129707/open24.png (Ctrl+Shift+O)

    • Open Recent Project

    • Close Project

    • Import Project from ZIP...

    • Export Project to ZIP...

    • Restart

    • Exit

  • Edit

    • Undo images/download/thumbnails/41129806/image2014-11-5_16_42_26.png

    • Redo images/download/thumbnails/41129806/image2014-11-5_16_43_7.png

  • View

    • Link Project View

    • Show Only Editor (Ctrl+Shift+Enter)

    • Full Screen (Alt+Shift+Enter)

  • Tools

    • Plugins

    • Options... images/download/thumbnails/41129806/image2014-11-5_16_37_6.png

  • Windows

    • Projects

    • Checker View

    • Reset Windows

    • Close Window

    • Close All Documents

    • Close Other Documents

    • Documents...

  • Help

    • Help (F1) images/download/thumbnails/41129806/image2014-11-5_16_37_58.png

    • Licenses... images/download/thumbnails/41129806/image2014-8-12_14_8_45.png

    • Check for Updates...

    • About ChemCurator...

Panels and Views

Most of the panels and views in ChemCurator are optionally resizable, or can be moved to different location or screen depending your preferences. The default settings can be restored by the Reset Windows function.

Project explorer panel

Project explorer panel displays the opened projects and represents the project's structure in a tree-like hierarchical way. Every project representing one document and you are able to add as many Markush structures and compound lists as you want to it. All Markush structure automatically have an Exemplified structures list.

Document view

Document view is the viewer component of the annotated documents and the related selections. The recognized chemical entities are highlighted by gray. In structure selection mode images/download/thumbnails/41129806/image2014-12-3_12_28_8.png users can select recognized chemical structures by clicking on any highlighted component or select a larger part of the document by pressing left mouse button and dragging it over the targeted part of the document. The selected structures are highlighted by red and displayed under the document in the selection panel. In text selection mode images/download/thumbnails/41129806/image2014-12-3_12_30_1.png users can select the document text directly. Document linking images/download/thumbnails/41129806/image2015-2-25_15_57_42.png turns on the automatic scrolling of the document based on the structure selections in the editor views. With images/download/thumbnails/41129806/image2014-12-3_12_34_47.png and images/download/thumbnails/41129806/image2014-12-3_12_35_20.png document's zoom level can be changed.

Compounds view

Compounds view is the display component of compounds lists. Can handle not only chemical structures but also the related additional information columns. Data can be edited by double-clicking on any of the cells.

Markush Editor view

Markush Editor View is the display component of the Markush structures and related exemplified structures. Markush Editor View is based on the same component like Markush Editor Desktop Application, therefore, the details of editing Markush structures are available in Markush Editor documentation. Markush Editor View compared contains an additional bottom line containing the exemplified structures related the Markush structure. Exemplified structures continuously validated against the Markush structure. Examples matching to the Markush highlighted by green non-matching structures highlighted by red.

Structure checker panel

Structure checker panel displays the structure drawing errors and warnings related to the active editor component. In the case of an error, an exclamation mark appears in a red circle images/www.chemaxon.com/marvin-archive/6.0.0/marvin/help/structurechecker/images/msketch/statusbar-checker-error.png , in the case of warning in as yellow triangle appears images/www.chemaxon.com/marvin-archive/6.0.0/marvin/help/structurechecker/images/msketch/defaultfeaturecheck.png . By clicking on the checker items you are able to choose between the available automatic fixer options. You are able to fix the issues one-by-one with the Fix Selected button or all together with the Fix All button.

Create new project

In ChemCurator, every project represents a document and the extracted chemical information belongs to this document. ChemCurator offers multiple project creation option based on different search formats. Independently from the original format, all document converted to an annotated HTML preserving the structure and layout of the original document. The time of annotation process strongly depends on the format, size, and content of the original document. The new project wizard available from File>New Project... or from the main toolbar with the images/download/attachments/41129806/image2014-8-12_13_58_8.png icon.

images/download/attachments/41129806/image2015-5-27_13_44_31.png

Import document from file

The project can be created from a file stored in your local machine. ChemCurator can process pdf, html, xml and txt documents.

Import document from Google Patents

Patent documents can be imported directly from Google Patents by using the publication number of the document. The import wizard automatically tries to find the corresponding document in Google Patents and automatically download the HTML version of the patent. Most of the non-English patents machine translated English version is available in Google Patents. If you want to download the original version select Original from language preferences.

images/download/attachments/41129806/image2014-12-4_14_0_57.png

Import document from IFI Claims

If you have IFI Claim access, you can also import documents directly from IFI Claims. The import wizard automatically tries to find the corresponding document in IFI Claims and automatically download the HTML version of the patent.

images/download/attachments/41129806/image2014-12-4_14_1_44.png

Create demo project

With creating demo project function, an example project can be created containing the annotated version of US6756383B2 patent document from Google Patents and some curated data including a Markush structure and compound list.

images/download/attachments/41129806/image2015-5-27_13_51_54.png

Annotation configuration

With annotation configuration, you are able to fine tuning the annotation parameters according to your needs. The settings panel available from File>Options... or from the main toolbar with the images/download/thumbnails/41129806/image2014-11-5_16_37_6.png icon.

images/download/attachments/41129806/image2016-1-19_11_35_17.png

Chemical data extraction

ChemCurator offers multiple function to help in the recognition and extraction of the relevant chemical information from documents.

Create new Markush or Compounds list

ChemCurator supports two type of chemical information, the Markush structures, and Compound list. Markush structure objects are always created together with a linked special compound list the Examples.

Manual structure extraction

Any annotated structure can be selected from the document. After selection, it can be moved using drag and drop from the selected structures view to editor components.

Compounds extraction wizard

Compounds extraction wizard is available in Compounds and Markush view. This wizard can help to automatically find and extract a large number of chemical structures from the documents. In the first panel of the wizard some basic filter criteria available.

images/download/attachments/41129806/image2016-1-19_11_37_58.png

The extraction process can be parametrized with some filter options.

images/download/attachments/41129806/image2016-1-19_11_40_29.png

Main options:

Filter duplicates: Ignore the duplications by extracting only the first occurrence of compounds from the document.

Minimum mass: Set a minimum molecular mass filter criteria.

Maximum mass: Set a maximum molecular mass filter criteria.

Structure filtering options:

None: Structure filter option ignored.

Substructure: A substructure filter criteria can be set after clicking on the Next button.

Similarity with threshold: A similarity filter criteria can be set after clicking on the Next button. MCS-based similarity calculation executed in the background and structures filtered by the Tanimoto similarity of the sutures.

If Substructure or Similarity with threshold selected by clicking on the Next button you can navigate to the second tab of the extraction wizard. In a case of Similarity with threshold only exact compounds can be used as a filter without any variability feature.In a case of Substructure any atom lists, bond lists, and any query property can be used.

images/download/attachments/41129806/image2014-8-12_14_38_43.png

After clicking on the Finish button extraction started. In a case of Similarity with threshold an additional column added to the extracted compound containing the similarity value of the compound.

Additional data extraction

Compounds view is capable of handling not only the chemical structures but also the related assay data, properties, comments, etc. You can manually add this information to the compounds lists using the Creat new column function.

images/download/attachments/41129806/image2016-1-19_13_34_0.png

A simple dialog opens where the name and type of the new column can be selected. The newly created column can be edited by simply double clicking on it.

images/download/attachments/41129806/image2016-1-19_13_35_31.png

Add structures manually

Markush fragments and compounds can be added manually from fragment and compound list's context menu and with the Add new row menu item of the compounds view.

images/download/attachments/41129806/image2016-1-19_13_30_32.png

images/download/attachments/41129806/image2016-1-19_13_29_6.png

images/download/attachments/41129806/image2016-1-19_13_22_26.png

The manually added compound can be linked to the corresponding part of the document. After a right click on any structure, you can select Add reference to document... function to specify the corresponding part of the document. After starting reverse linking document view enters reverse linking mode and any part of the document can be selected. After selecting the corresponding part of the document and clicking on OK the selected part of the text will be marked as a chemical entity and linked to the manually added compound. If Add to local dictionary check box selected, the selected text and the linked compound are added to ChemCurator dictionary and will be recognized the next time during annotation.

images/download/attachments/41129806/image2016-1-19_13_51_53.png

Import compounds

Import compounds wizard can add compounds file with molecule properties to the selected project as a new Compounds List and automatically associate the important compounds to the first occurrence in the document.

images/download/thumbnails/41129806/image2017-3-22_13_9_8.png

Fixing annotation errors

The accuracy of structure recognization is not 100% so annotated documents always contain some unrecognized or misrecognized structures.

Fixing misrecognized structures

Text and image based misrecognized structures can be fixed by selecting the problematic structure in the document view, simply double clicking on it on the selection view or right-clicking on it and choosing the edit option.

images/download/attachments/41129806/image2016-1-19_14_10_40.png

Fixing unrecognized structures

Unrecognized structures can be annotated by the Fix annotation menu item.

images/download/attachments/41129806/image2016-1-19_14_16_15.png

Clicking on this button the document view enters reverse linking mode and any part of the document can be selected.

images/download/attachments/41129806/image2016-1-19_14_24_56.png

After clicking on the OK button, an interactive text fixing dialog opens. If the modified text can be recognized, the recognized structure appears under the text input field. The structure immediately following any modification of the text. Potentially problematic parts of the chemical names are underlined. After successful fixing, the recognized chemical structure can be added to the corresponding part of the document by clicking on OK button.

images/download/attachments/41129806/image2016-1-19_14_28_2.png

Remove annotations

Any unwanted annotation can be removed by selecting the problematic structure in the document view, and right-clicking on it in the selection view and choosing the Remove option.

images/download/attachments/41129806/image2016-1-19_14_11_40.png

Share and Export functions

ChemCurator offers multiple options for project sharing and export of the annotated data in various formats.

Share projects with ChemCurator integration server

ChemCurator Integration Server is the most standard way to share your project with your colleagues and store them in a central database. For the server installation details please check the Integration Server Administrator Guide additionally you need to configure the server connection details in the Chem Curator desktop application following the corresponding section of the Installation Guide.

images/download/attachments/41129806/image2014-11-7_14_3_5.png

After successful sharing, a new indicator icon appears next to the project, and you are able to upload your modifications or download the newer version of the project.

images/download/attachments/41129806/image2014-11-7_14_7_16.png

Export from Compounds and Markush view

Structure export function images/download/thumbnails/41129806/image2014-11-7_13_47_57.png is available in compounds and Markush view. The structure and related information from the view can be exported in various file formats

Export project to ZIP file and import from Zip file

A project can be exported to a ZIP file by File>Export Project to ZIP... in this way the project can be easily shared by e-mail or any file sharing method.

images/download/attachments/41129806/image2015-5-27_13_59_44.png

The zipped project can be imported in a similar way by File>Import Project from ZIP... function.

Using the project folder directly

All projects are available in project directories. The default location of the projects is the C:\Users\<user name>\Documents\ChemCurator directory. The name of the project directory is the project name. Every project contains a project file (an xml with some metainformation), a document html with the connected resources and the extracted chemical information in sdf (compound lists) and mrv (Markush structures) formats.

images/download/attachments/41129806/image2014-8-12_14_28_49.png