SOV (Segment OVerlap) measure

(secondary structure element assignment evaluation)


SOV service is designed to facilitate the comparison and evaluation of secondary structure element assignments.

Evaluation of the similarity between PSEQ and OSEQ sequences is done for each  
conformational state (helix, strand, coil) separately and for all conformational 
states combined. The measures used are:

  Q3  - traditional per-residue prediction accuracy Qindex 
  SOV - Segment OVerlap measure (the definition by Zemla et al. - PROTEINS:
        Structure, Function, and Genetics, 34, 1999, pp. 220-223 [MEDLINE])


                               Q3 measure

Qindex: (Qhelix, Qstrand, Qcoil, Q3) gives percentage of residues predicted 
correctly as helix, strand, coil, and all (all three conformational states combined). 
The definition of Qindex is as follows. 

For a single conformational state: 

                number of residues correctly predicted in state i
       Qi    =  ------------------------------------------------- * 100,
                     number of residues observed in state i


where i is either helix, strand or coil. 

For all three states: 

                number of residues correctly predicted
       Q3    =  -------------------------------------- * 100
                        number of all residues



                              SOV measure

Segment OVerlap quantity measure for a single conformational state: 


                 1     SUM   MINOV(S1;S2) + DELTA(S1;S2)
     SOV(i)  =  ---    SUM   ---------------------------  * LEN(S1)
                N(i)   SUM           MAXOV(S1;S2)
                       S(i)


S1 and S2       are the observed and predicted secondary structure segments 
                (in state i, which can be either H, E or C);
LEN(S1)         is the number of residues in the segments S1; 
MINOV(S1;S2)    is the length of actual overlap of S1 and S2, i.e. 
                the extent for which both segments have residues in state i, 
                for example H;
MAXOV(S1;S2)    is the length of the total extent for which either of 
                the segments S1 or S2 has a residue in state i;
DELTA(S1;S2)    is the integer value defined as being equal to the 
                MIN{(MAXOV(S1;S2)- MINOV(S1;S2)); MINOV(S1;S2); 
                    INT(LEN(S1)/2); INT(LEN(S2)/2)}

THE SUM         is taken over S, all the pairs of segments {S1;S2},  
                where S1 and S2 have at least one residue in state i 
                in common;

N(i)            is the number of residues in state i defined as follows: 

                SUM             SUM 
       N(i)  =  SUM LEN(S1)  +  SUM LEN(S1)
                SUM             SUM
                S(i)           S'(i)

Two sums are taken over S and S'

S(i)            is the number of all the pairs of segments {S1;S2},
                where S1 and S2 have at least one residue in state i 
                in common

S'(i)           is the number of segments S1 that do not produce
                any segment pair


Segment OVerlap quantity measure for all three states: 


                 1   SUM   SUM   MINOV(S1;S2) + DELTA(S1;S2)
        SOV  =  ---  SUM   SUM   ---------------------------  * LEN(S1)
                 N   SUM   SUM           MAXOV(S1;S2)
                      i    S(i)

where the normalization value N is a sum of N(i) over all three
conformational states (i = HELIX, STRAND, COIL):

                SUM 
          N  =  SUM  N(i)
                SUM
                 i


SOV observed indicates that S1 is observed fragment and S2 is predicted one.
SOV predicted indicates that S1 is predicted fragment and S2 is observed one. 


-------------------------------------------------------------------------------

                         Data format of prediction 

The SSP (secondary structure prediction) data can be prepared 
in COLUMN format:

    First column: protein sequence (AA) in one-letter code 
    Second column: observed (OSEC) secondary structure 
    Third column: predicted (PSEC) secondary structure 

Secondary structure conformational states can be either helix (H), strand (E) or coil (C). 
Note: Alternatively, for helix assignment 'G' or 'I' can be used instead, 
for strand assignment 'B' can be used instead, and
for coil assignment 'L', 'T' or 'S' can be used instead.
Spaces should be used as delimiters to separate columns. 


Example.1 of input data format: 
*******************************

AA  OSEC PSEC
M   C    C  
Q   C    C  
T   C    H  
R   H    H  
S   H    H  
I   H    H  
G   C    C  
V   C    C  


-------------------------------------------------------------------------------

Three other formats of the input data are also allowed:

Example.2 of input data format: 
*******************************
 
 AA  OSEC  PSEC     NUM
  M   C     C         1
  Q   C     C         2
  T   C     H         3
  R   H     H         4
  S   H     H         5
  I   H     H         6
  G   C     C         7
  V   C     C         8


Example.3 of input data format: 
*******************************
 
>OSEQ
CCCHHHCC
>PSEQ
CCHHHHCC
>AA
MQTRSIGV


Example.4 of input data format: 
*******************************
 
SSP  1   M   C     C        
SSP  2   Q   C     C        
SSP  3   T   C     H        
SSP  4   R   H     H        
SSP  5   S   H     H        
SSP  6   I   H     H        
SSP  7   G   C     C        
SSP  8   V   C     C        

-------------------------------------------------------------------------------

Output: 
*******

 SECONDARY STRUCTURE PREDICTION
 NUMBER OF RESIDUES PREDICTED: LENGTH = 8
 AA  OSEC  PSEC     NUM
  M   C     C         1
  Q   C     C         2
  T   C     H         3
  R   H     H         4
  S   H     H         5
  I   H     H         6
  G   C     C         7
  V   C     C         8
 -----------------------

 SECONDARY STRUCTURE PREDICTION ACCURACY EVALUATION.  N_AA =    8

                                   ALL    HELIX   STRAND     COIL

 Q3                         :     87.5    100.0    100.0     80.0

 SOV                        :    100.0    100.0    100.0    100.0

 -----------------------