Markush DARC format

Codename: vmn

Contents

Import from VMN format

Markush structure files complying the Markush DARC format are processed by Marvin, though with some limitations. Interpretation of the main VMN features are listed below. Multiple (more than 2) attachment R-groups are temporarily represented by attached data, as described below. This representation will be changed to its final version for the next major release. Peptide connection bonds may not be processed correctly and some peptides are not supported yet. The AMN file is looked for in the directory of the VMN file, with the same name and .amn extension (e.g. the AMN file of 46mrk001.vmn is 46mrk001.amn). The AMN attributes and some of the atom attributes are not processed by search and enumeration but displayed in atom labels.

Interpretation of VMN features

Structure shortcuts (abbreviated groups)

The following structure shortcuts (abbreviated groups) are supported:

ACEBUC2, C3, ..., C20CNCO1CO2
COIETIBUIPRMBENBU
NO2NPROBEPBEPHPO3
PO4SBUSO2SO3TBU

Amino acids (peptides)

The following amino acids (peptide abbreviated groups) are supported:

ALAARGASNASPCYSGLN
GLUGLYHISILELEULYS
METPHEPROSERTHRTRY
TYRVAL

For more information, refer to the Peptide import documentation.

Superatoms (homology pseudo atoms)

Superatoms representing homology groups are read in as pseudo atoms. The following homologies are interpreted by enumeration and search:

CHKCHECHYCYCARYHET
HEAHEFUNKMXAMXA35
TRMLANACTHALACYPRT
XX

For a detailed description of the interpretation, refer to the Homology groups in Markush structures manual.

Temporary representation of multiple attachment R-groups

Note, that the current representation is temporary and will be replaced by a final solution in the next major release.

The usual attachment point representation can be used for 1- and 2-attachment R-groups:

For R-groups with more than 2 attachments we use the following temporary representation:

Attached data can also be set manually in MarvinSketch, as described in the MarvinSketch Chemical Features manual.

Both the attachment order numbers and the attachment point numbers are consecutive positive integers 1, 2, 3, ..., where attachment i is to be connected to the R-atom neighbor with order number i, for each i. Multiple attachment numbers on the same atom are separated by commas.

Attachment point numbers can be omitted for single atom R-group definition members. If the order numbers are missing for an R-atom, then the atom index order of the neighboring atoms is used to determine the order numbers.

Limitations

Repeating units with repetition ranges

The number of elements in a repetition range is limited to 10 (e.g. range M100=2,5- is interpreted as M100=2,5-13). Repeating units with more than 4 crossing bonds are not processed by search and enumeration.

Peptides that are not supported

The following peptides are not supported by Peptide import (and therefore are not read in from VMN files):

ABUaminobutyric acid
ASUaminosuberic acid
GLPpyroglumatic acid
HCYhomocysteine
HSEhomoserine
NLEnorleucine
NVAnorvaline
ORNornithine
SARsarcosine
STAstatine

Note, that peptide connection bonds are not handled currently, therefore peptide sequences may not be correct.

Superatoms (homologies) that are not supported

The following superatoms (homologies) are not supported by search and enumeration (but read in and displayed as pseudo-atoms):

POLPEGXXDYEPRT

The detailed description of homology interpretation is described in the Homology groups in Markush structures manual.

Atom attributes that are not processed by search and enumeration

The following atom attributes are not processed by search and enumeration (but displayed in atom labels):

CRCarbon chain/ring attribute
DTDeuterium and tritium count
MUMultiplier attribute
NUNumerotation attribute (link to AMN data)
PAPolymer indicator
SPPosition indicator

Export to VMN format

VMN export is not available yet.

References

  1. The Markush DARC Format, T Ferns, Internal Technical Report, Thomson Scientific, 2002
  2. Derwent World Patents Index, Markush DARC User Manual, The Thomson Corporation, 1993, 2008