AS2TS service is designed to facilitate the construction of 3D models for protein sequences.
For a given query sequence the process of 3D models construction is performed as follows:
1. searching for the closest sequence homologs in Protein Data Bank (PDB) 2. calculating sequence alignments between query and detected sequence homologs 3. assigning ATOM coordinates from PDB structures of detected homologs to the corresponding amino-acids from the query sequence 4. reporting list of closest homologs from PDB, calculated alignments, and corresponding 3D models (coordinates of mainchain atoms, sidechain atoms can be calculated optionally using SCWRL program)
For AS2TS processing the sequence of amino-acids should be entered in FASTA format.
A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column (see an example below): >Name RKNGLNVKMDYTPNSGQLVRNLLNGKYNIAVAGIDNVIAYQEGQVKEPVVNPDMFAFYGV KELKLDYELKPMDFSGIIPALQTKNVDLALAGITITDERKKAIDFSDGYYK
For the process of model building several options are available:
- Number of generated models this number allows a user to select how many closest homologs from PDB will be searched for model building - Mutation matrix a user can choose from different "substitution matrices" (BLOSUM_45, PAM_250, BLOSUM_50, BLOSUM_62, BLOSUM_80, PAM_70, PAM_30) used by the alignment programs. The theory of amino acid substitution matrices is described in . By changing substitution matrices the user can evaluate calculated alignments and determine how stable created 3D models are. The following default gap penalties are assigned to each selected substitution matrix: BLOSUM45 -G 15 -E 2 PAM250 -G 14 -E 2 BLOSUM50 -G 13 -E 2 BLOSUM62 -G 11 -E 1 BLOSUM80 -G 10 -E 1 PAM70 -G 10 -E 1 PAM30 -G 9 -E 1 The substitution matrices are listed in the order which reflects the following recommendations of usage: long alignments with low similarity ---> short alignments with high similarity - Pairwise sequence alignment search pairwise sequence alignment searches are performed against PDB. Smith-Waterman, FASTA or BLAST can be selected as an alignment search program - Multiple sequence alignment search by selecting PSI-BLAST a user can decide how many BLAST iterations have to be performed against sequences from NR library (non-redundant sequences from NCBI). The final PSI-BLAST iteration is run against PDB. - Side chains building procedure (SCWRL)
Example of the reported results from AS2TS system:
Model PDB N_AA SISC E-val Seq_ID LAL Overlap M_00 1fil 139 5 2e-44 29.000 130:137 (2-131:1-137) M_01 1awi_A 138 4 4e-44 28.000 129:136 (3-131:1-136) M_02 1pne 140 2 4e-44 29.000 130:137 (2-131:2-138) M_03 2btf_P 139 1 4e-44 29.000 130:137 (2-131:1-137) M_04 1d1j_A 139 4 7e-44 23.000 130:137 (2-131:1-137) M_05 1a0k 131 1 4e-42 14.000 124:135 (1-131:1-129) M_06 1plm_A 130 1 5e-42 13.000 123:132 (4-131:2-128) M_07 1g5u_A 131 2 5e-41 14.000 125:136 (1-132:1-130) M_08 3nul 130 1 4e-40 13.000 123:132 (4-131:2-128) M_09 1cqa 133 1 5e-40 13.000 121:138 (1-132:1-132) M_10 1ypr_A 125 3 4e-31 12.000 114:132 (4-129:2-121) M_11 1f2k_A 125 3 1e-25 9.000 116:136 (4-132:2-124) M_12 1prq 125 2 3e-25 9.000 116:136 (4-132:2-124) M_13 1acf 125 1 8e-25 9.000 116:136 (4-132:2-124) M_14 1bhn_A 152 6 1.5 10.000 91:116 (36-132:4-113) M_15 1pku_A 150 12 1.7 10.000 79:99 (48-132:19-111) M_16 1fiq_C 763 1 2.2 10.000 66:70 (47-112:146-215) M_17 1v97_A 1332 6 2.2 10.000 66:70 (47-112:715-784) M_18 1n5x_A 1331 2 2.2 10.000 66:70 (47-112:714-783) M_19 1ha7_B 172 12 2.4 13.000 40:43 (88-127:121-163) where the following information is provided in columns: - Model link to the coordinates of calculated model - PDB link to the information about a PDB template used for model building - N_AA number of amino-acids in the template sequence - SISC number of different sets of coordinates of a protein template (link to PDB files) - E-val score (E-value) calculated by the selected alignment program (lower values indicate better alignments) - Seq_ID sequence identity calculated from the alignment - LAL (N:M) N - number of amino-acids assigned from ATOM coordinates from the PDB template, M - length of the sequence alignment - Overlap the sequence ranges in the alignment between the query and PDB template
References:  Altschul, S.F. (1991) "Amino acid substitution matrices from an information theoretic perspective." J. Mol. Biol. 219:555-565.  Smith, T. F., Waterman, M. S. (1981). Identification of common molecular subsequences. J Mol Biol 147(1), 195-197.  Pearson, W. R. (1991). Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics11:635-650.  Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17), 3389-3402.  Bower, M., Cohen, F. E., Dunbrack, Jr. R. L. (1997). "Sidechain prediction from a backbone-dependent rotamer library: A new tool for homology modeling" J. Mol. Biol. 267, 1268-1282.