Saturday, November 24, 2012

Conventions for naming files

The structure files of native proteins draw most of their information from Protein Data Bank files and are expressed in the same format. Therefore it is appropriate to name the native protein files by adding short prefixes or suffixes to the names of the original crystal structures supplied by the Protein Data Bank. I have chosen to use the prefix "n" to indicate a native structure, together with a numeral. "n1_" indicates a first approximation to the native structure using subunit rearrangement only. "n2_" indicates a second approximation achieved by restoring the conformation of a small number of residues, usually less than 10% of the total sequence. "n3_" would be used for a full relaxation of the crystal structure into a native structure using a force field. For α2β2 heterotetramers, e.g. hemoglobin, there are 3 possible quaternary isomers and I have chosen to use suffixes to indicate which isomer the file describes. Where there are existing abbreviations for the isomers it is best to retain them. For hemoglobin, I use the suffix "_r" for the R state, "_t" for the principal T state and "_t2" for the alternate T state that might be significant in the absence of polyanions. In the case of elongated proteins in α2β2 heterotetramers, such as the heavy chain of immunoglobulin, one isomer can be described as "cis" and the other two as "trans". A pair of immunogobulin structures will be placed in the Gallery shortly, the cis form with the suffix "_cis" and the principal trans form with the suffix "_tr".

No comments: