SOV service is designed to facilitate the comparison and evaluation of secondary structure element assignments.
Evaluation of the similarity between PSEQ and OSEQ sequences is done for each conformational state (helix, strand, coil) separately and for all conformational states combined. The measures used are: Q3 - traditional per-residue prediction accuracy Qindex SOV - Segment OVerlap measure (the definition by Zemla et al. - PROTEINS: Structure, Function, and Genetics, 34, 1999, pp. 220-223 [MEDLINE]) Q3 measure Qindex: (Qhelix, Qstrand, Qcoil, Q3) gives percentage of residues predicted correctly as helix, strand, coil, and all (all three conformational states combined). The definition of Qindex is as follows. For a single conformational state: number of residues correctly predicted in state i Qi = ------------------------------------------------- * 100, number of residues observed in state i where i is either helix, strand or coil. For all three states: number of residues correctly predicted Q3 = -------------------------------------- * 100 number of all residues SOV measure Segment OVerlap quantity measure for a single conformational state: 1 SUM MINOV(S1;S2) + DELTA(S1;S2) SOV(i) = --- SUM --------------------------- * LEN(S1) N(i) SUM MAXOV(S1;S2) S(i) S1 and S2 are the observed and predicted secondary structure segments (in state i, which can be either H, E or C); LEN(S1) is the number of residues in the segments S1; MINOV(S1;S2) is the length of actual overlap of S1 and S2, i.e. the extent for which both segments have residues in state i, for example H; MAXOV(S1;S2) is the length of the total extent for which either of the segments S1 or S2 has a residue in state i; DELTA(S1;S2) is the integer value defined as being equal to the MIN{(MAXOV(S1;S2)- MINOV(S1;S2)); MINOV(S1;S2); INT(LEN(S1)/2); INT(LEN(S2)/2)} THE SUM is taken over S, all the pairs of segments {S1;S2}, where S1 and S2 have at least one residue in state i in common; N(i) is the number of residues in state i defined as follows: SUM SUM N(i) = SUM LEN(S1) + SUM LEN(S1) SUM SUM S(i) S'(i) Two sums are taken over S and S' S(i) is the number of all the pairs of segments {S1;S2}, where S1 and S2 have at least one residue in state i in common S'(i) is the number of segments S1 that do not produce any segment pair Segment OVerlap quantity measure for all three states: 1 SUM SUM MINOV(S1;S2) + DELTA(S1;S2) SOV = --- SUM SUM --------------------------- * LEN(S1) N SUM SUM MAXOV(S1;S2) i S(i) where the normalization value N is a sum of N(i) over all three conformational states (i = HELIX, STRAND, COIL): SUM N = SUM N(i) SUM i SOV observed indicates that S1 is observed fragment and S2 is predicted one. SOV predicted indicates that S1 is predicted fragment and S2 is observed one. ------------------------------------------------------------------------------- Data format of prediction The SSP (secondary structure prediction) data can be prepared in COLUMN format: First column: protein sequence (AA) in one-letter code Second column: observed (OSEC) secondary structure Third column: predicted (PSEC) secondary structure Secondary structure conformational states can be either helix (H), strand (E) or coil (C). Note: Alternatively, for helix assignment 'G' or 'I' can be used instead, for strand assignment 'B' can be used instead, and for coil assignment 'L', 'T' or 'S' can be used instead. Spaces should be used as delimiters to separate columns. Example.1 of input data format: ******************************* AA OSEC PSEC M C C Q C C T C H R H H S H H I H H G C C V C C ------------------------------------------------------------------------------- Three other formats of the input data are also allowed: Example.2 of input data format: ******************************* AA OSEC PSEC NUM M C C 1 Q C C 2 T C H 3 R H H 4 S H H 5 I H H 6 G C C 7 V C C 8 Example.3 of input data format: ******************************* >OSEQ CCCHHHCC >PSEQ CCHHHHCC >AA MQTRSIGV Example.4 of input data format: ******************************* SSP 1 M C C SSP 2 Q C C SSP 3 T C H SSP 4 R H H SSP 5 S H H SSP 6 I H H SSP 7 G C C SSP 8 V C C ------------------------------------------------------------------------------- Output: ******* SECONDARY STRUCTURE PREDICTION NUMBER OF RESIDUES PREDICTED: LENGTH = 8 AA OSEC PSEC NUM M C C 1 Q C C 2 T C H 3 R H H 4 S H H 5 I H H 6 G C C 7 V C C 8 ----------------------- SECONDARY STRUCTURE PREDICTION ACCURACY EVALUATION. N_AA = 8 ALL HELIX STRAND COIL Q3 : 87.5 100.0 100.0 80.0 SOV : 100.0 100.0 100.0 100.0 -----------------------