Markush Enumeration Plugin

A Markush structure is a description of a compound class by generic notations, primarily used in patent claims and the description of combinatorial libraries. The library of a Markush structure is the total set of specific molecules that are described by the Markush structure.

The Markush enumeration plugin can be used to generate a whole or a subset of the library of a generic Markush structure. It is also capable of calculating the total number of specific structures present in a Markush library. The plugin is accessible from the Marvin GUI (Tools->Markush Enumeration), through the cxcalc command-line program (See this link for the detailed usage of the plugin in command line.), via API and in the Chemical Terms functions in JChem.

Markush features Functionality of the plugin

Markush features

Currently, the Markush enumeration plugin supports the following features that describe Markush structures in combinatorial libraries:

  1. R-groups

    R-groups (also referred to as "substituent variation") are the most widely known Markush generic features. The variable part of the structure is denoted by an R-atom (eg. R1), and the definitions are given separately. In each definition the connection points must be defined to show where the bonds of the R-atom are linked. R-atoms can appear in both rings and chains and can have up to two attachments points. The same R-atom can appear multiple times, and the different occurrences are handled as different cases. (So they can be substituted with different definitions.) R-group nesting in R-group definitions is allowed to any depth, but without recursion. (An R-group definition cannot use the R-atom it is defining, not even through the use of other embedding R-atom(s).) R-groups up to number R32767 can be used.

    Example Example Markush library member

    R-group drawing in Marvin Sketch is described in the Marvin Sketch User's Guide.

  2. Atom lists

    Atom lists are another example of substituent variation. They define lists of atom types at a given position. There is no restriction for the length of the list and for bond count of atom lists. Atom list drawing in Marvin Sketch is described here.

    Example Example Markush library member

  3. Bond lists

    The following bond lists (generic bond types) are supported by the plugin: single or double, any(single, double or triple), single or aromatic, double or aromatic. In Marvin Sketch, bond lists are accessible amongst query bond types in the bonds pop-up menu.

    Example Example Markush library member

  4. Link nodes

    Link nodes are atoms that may repeat between two of their designated bonds (called outer bonds, denoted by brackets). All other substituents (if exist) repeat together with the atom. In the results, the new bonds between the repeating atoms will have the bond type of the lower order outer bond. Link nodes can be drawn in Marvin Sketch using the popup menu.

    Example Example Markush library member

  5. Repeating units

    Repeating units represent structural parts that can be repeated several times. The repeating unit is enclosed in brackets with one or two head and the same number of tail crossing bonds. (Head crossing bonds go through the left bracket.) Two bond pairs represent ladder type repeating units. The repetition range is a comma-separated list of possible repetitions or repetition intervals, e.g. "1,3,5-9". The repetition pattern specifies the way how the subsequent repeated units are linked together: it can be head-to-head(hh), head-to-tail(ht) or either/unknown(eu) (the either/unknown case is not handled by the search software). In case of ladder type polymers there is also a flip(f) option that defines that the top and bottom crossing bonds are flipped during each connection. repeating groups with specified repetition ranges.

    Repeating unit drawing is described in the Marvin Sketch Help here, and ladder-type bracket drawing is described at the polymer drawing section.

    Example Example Markush library member

  6. Position variation bonds

    Position variation bonds are bonds attached to variable atoms at one or both end positions. The set of variable atoms is drawn as a multicenter group. A position variation bond connects one atom from one end position to one atom from the other end position. If the end position is a single atom then the bond is attached to this atom, if the end position is a multicenter group then the bond is attached to an arbitrary member of the group. Position variation drawing in Marvin Sketch is described in Help.

    Limitations:

    If a link node is a member of a multicenter group then the group will include the repeated atoms as well in case when the original multicenter group contains no more atoms from the link fragment, otherwise the position variation bond is part of the link fragment and repeated together with the link node. Although an R-atom is not allowed to take part in position variation, it can be the single-atom end position of a position variation bond, in which case its attachment point is connected to the bond.

    Example Example Markush library members

  7. Homology Groups

    Homology groups stand for sets of homologous molecular parts (e.g. functional groups). These are represented by pseudo atoms labelled with the common chemical annotation of the groups (alkyl, aryl, heterocycle etc.). See the detailed definition of these groups in a separate document. The pseudo atoms can be most easily drawn in Marvin Sketch using the Homology Groups template group.

    Example Example Markush library member

    There are two major types of homology groups regarding their way of definition:

    1. Built-in groups are defined by specific structural properties of the group. These groups are not enumerated during searching, but the query structure is recognized as fulfilling the requirements for such a structure. The possible number of covered structures is usually infinite, unless the number of atoms is limited. Examples of built-in groups are alkyl, aryl, heterocycle, etc.
    2. User-defined groups are explicitly defined and only the listed structures can match on these homology groups. The definition is given in the form of an R-group definition, and any of the generic features discussed in this chapter can be used in the definition. These definitions can be customized by the user, and may be context-specific. (E.g. protecting group definition depends on which functional group it is protecting.)

    Read more about homology groups.

Functionality of the plugin

The plugin allows the following functionality. Examples are given using Marvin GUI.
Sequential enumeration
Enumerates members of the Markush library in a sequential manner (by substituting the first definition of the first variable, etc). The results are specific structures. The plugin user interface allows the enumeration of all library members, or a specified number.

Random enumeration
This mode generates a random subset of the Markush library to give a quick sampling. It is especially helpful for huge libraries, where full enumeration is impossible. In random mode variable parts are chosen randomly, and the substitution probability of each definition is proportionate with the fragment library size that the given definition generates. This ensures the generation of uniform distribution of representatives over the Markush library space.

Calculate library size
The size of the Markush library can be calculated by arbitrary precision. On the user interface, the exact value is displayed until 20 digits, above that only the magnitude is shown (for example, 10^28). The calculated number is the size of the whole library, and does not consider the valence check filter. (See below.)

If the 'Enumerate homology groups' option is enabled, the number of enumerated molecules increases accordingly, multiplied by the number of built-in species.

Selected part enumeration
If part of the Markush structure is selected, only the generic features in the selected part are considered for enumeration/calculation. This allows focusing on a particular area of the Markush structure. Enumeration of selected parts only may result in generating (more specific) Markush structures.

Valence filter
If the Markush structure is not properly (or too generally) formulated, it is possible that it describes structures with valence errors. In this case, the valence filter setting is useful to filter out the offending result structures. The default value is off (no filtering).

Homology group enumeration
Version 5.2 introduced the enumeration of homology groups. Homology groups are R-groups, represented as pseudo atoms - with the names covering a set of R-groups either built-in or user-defined. For detailed information on homology groups click here.

Scaffold alignment and coloring
Coloring the scaffold (part of the structure containg no Markush features) and/or the R-groups in enumerated structures can help visual recognition of parts of the molecules. Differentiation of the structures is aided by alignment of all structures to the original scaffold. These options are available in sequential and random enumeration.

Markush code generation
A special ID number can be generated for the library members: every structure gets its own unique tag (molecule property), which can be saved in the structure file (in .mrv and .sdf formats) named as 'Markush Code'. This ID is visible in the plugin result window as well. It gives the following information:
  1. Ri(n):x R-group number i (at atom nr. n) is the ligand containing the atom numbered x (which is the smallest number in that fragment but not neccessarily the attachment point)
    Custom reagent codes: instead of atom index numbers, custom reagent codes (e.g. company identifiers) can also be used. Add attached data to R-group members with name 'reagent'. These reagent codes will appear in the enumerated structures both in the Markush code and in the generated molecule structures. (See example below)
  2. ID tag name the name you specified in the options panel (in this example Test1). If a tag with this name is attached to the Markush molecule, its value will be used.
  3. Ln:x link node on atom nr. n in the variation nr. x (in this example 1 or two methylene groups are inserted).
  4. Bn-m:x bond between atoms n and m is nr. x in the bond list (referring to the bond type)
  5. PVn-m:x-y position variation bond between n and m (multicenter numbered) occured between atoms x and y
  6. An:x atom nr. n is nr. x from the atom list