Peptide import and export

Codename: peptide

Peptide sequence format
Import options
Export options
Custom amino acids

Peptide sequence format

Peptides can be entered using one or three letter amino acid abbreviations. A text file containing sequences should contain only one type of sequence (only one or only three lettered sequences but not both). Each line must have one and only one continuous line in the text file without spaces. Abbreviations used:

Ala

Arg

Asn

Asp

Cys

Gln

Glu

Gly

His

Ile

Leu

Lys

Met

Phe

Pro

Ser

Thr

Try

Tyr

Val

Valid files are like:

PPPALPPKKR

aptmppplpp

ProProProAlaLeuProProLysLysArg

AlaProThrMetProProProLeuProPro

but these are incorrect:

PPPALPPKKR

AlaProThrMetProProProLeuProPro

ProProProAlaLeuProProLysLysArg

AlaProThrMetPPPLPP

Import options

--peptide <string> The string is a valid one or three letter sequence. Example:

molconvert --peptide FFKMLL mol -o peptide.mol will convert a one-letter sequence to a molfile

Export options

peptide:3 Using this option the output will be a three-letter sequence. Examples:

echo "[H]NCC(=O)NC(C)C(=O)NCC(O)=O" | molconvert peptide:3 will convert SMILES representation to a three-letter sequence

molconvert --peptide GAG peptide:3 will convert one-letter sequence to a three-letter sequence

peptide:1 One-letter peptide sequence option. Example:

echo "[H]NCC(=O)NC(C)C(=O)NCC(O)=O" | molconvert peptide:1 will convert the SMILES string to a one-letter sequence

Custom amino acids

Apart from the essential amino acids that are already recognizable, it is possible to define custom amino acids with non-standard sidechains or with alternative protonation states. The usual format of the dictionary file is:

	Ala	A	[CX4H3][C@HX4H1]([NX3])C=O						3	4
	Arg	R	[N;X3][C@@H]([CH2][CH2][CH2][N;H1X3][C;X3]([N;H2X3])=N)C=O		1	10
	Asn	N	[#7;X3][C@@H]([CH2]C([N;H2X3])=O)[C;X3]=O				1	7
	Asp	D	[NX3][C@@HH1]([CH2]C([OX2H1])=O)C=O					1	7
    ...

where the corresponding columns are:

long (three-letters code) abbreviation
short (one-letter code) abbreviation
SMARTS representation of the amino acid fragment
the number of the backbone N in the SMARTS string (the third atom for Ala in the first line of the example)
the number of the backbone C next to the acyl oxygen (fourth atom for Ala in the first line of example)

To create a custom amino acid abbreviation it is assumed that its name will start with X and some other letters will follow this character between parentheses. It is adviced to set this string for both the short and the long name of the custom amino acid. Valid lines are:

	X(Hcy)		X(Hcy)		[SX2H1][CH2][CH2][C@HH1]([NX3])C=O		5	6
	X(1-foo)	X(1-foo)	[SX2H1][CH2][C@HH1]([NX3])C=O			4	5
	X(b)		X(b)		[CH3][CH2][CH2][CH2][CH2][C@HH1]([NX3])C=O	7	8
	...

Note the SMARTS strings representing amino acid fragments are denoting the hydrogens and sometimes the connection numbers to avoid ambiguity. For example if only the C[C@H](N)C=O string is used for alanin, this would match for many other amino acids as well as some of them are "containing" alanin as a substructure. Users can store their custom amino acids in the custom_aminoacids.dict file in the .chemaxon directory (UNIX) or the user's chemaxon directory using MS Windows.

Peptide import and export

Contents

Peptide sequence format

Import options

Export options

Custom amino acids