From phd@EMBL-Heidelberg.deWed May 7 08:29:08 1997 Date: Wed, 07 May 1997 10:03:15 +0000 (GMT) From: phd@EMBL-Heidelberg.de To: dwf@polysci.umass.edu Subject: Predict-Protein The following information has been received by the server: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ________________________________________________________________________________ reference predict_h28028 (Wed May 7 11:38:16 MDT 1997) from dwf@polysci.umass.edu password(###) resp HTML orig HTML prediction of: -secondary structure-solvent accessibility- return msf format return column format # Phosphotriesterase MQTRRVVLKSAAAAGTLLGGLAGCASVAGSIGTGDRINTVRGPITISEAGFTLTHEHICG SSAGFLRAWPEFFGSRKALAEKAVRGLRRARAAGVRTIVDVSTFDIGRDVSLLAEVSRAA DVHIVAATGLWFDPPLSMRLRSVEELTQFFLREIQYGIEDTGIRAGIIKVATTGKATPFQ ELVLKAAARASLATGVPVTTHTAASQRDGEQQAAIFESEGLSPSRVCIGHSDDTDDLSYL TALAARGYLIGLDHIPHSAIGLEDNASASALLGIRSWQTRALLIKALIDQGYMKQILVSN DWLFGFSSYVTNIMDVMDRVNPDGMAFIPLRVIPFLREKGVPQETLAGITVTNPARFLSP TLRAS ________________________________________________________________________________ The sequence had been interpreted as being: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ________________________________________________________________________________ >P1; /home/phd/server/work/predict_h28028 (#) phosphotriesterase MQTRRVVLKSAAAAGTLLGGLAGCASVAGSIGTGDRINTVRGPITISEAGFTLTHEHICG SSAGFLRAWPEFFGSRKALAEKAVRGLRRARAAGVRTIVDVSTFDIGRDVSLLAEVSRAA DVHIVAATGLWFDPPLSMRLRSVEELTQFFLREIQYGIEDTGIRAGIIKVATTGKATPFQ ELVLKAAARASLATGVPVTTHTAASQRDGEQQAAIFESEGLSPSRVCIGHSDDTDDLSYL TALAARGYLIGLDHIPHSAIGLEDNASASALLGIRSWQTRALLIKALIDQGYMKQILVSN DWLFGFSSYVTNIMDVMDRVNPDGMAFIPLRVIPFLREKGVPQETLAGITVTNPARFLSP TLRAS ________________________________________________________________________________ The alignment that has been used as input to the network is: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ________________________________________________________________________________ --- ------------------------------------------------------------ --- MAXHOM multiple sequence alignment --- ------------------------------------------------------------ --- --- MAXHOM ALIGNMENT HEADER: ABBREVIATIONS FOR SUMMARY --- ID : identifier of aligned (homologous) protein --- STRID : PDB identifier (only for known structures) --- PIDE : percentage of pairwise sequence identity --- WSIM : percentage of weighted similarity --- LALI : number of residues aligned --- NGAP : number of insertions and deletions (indels) --- LGAP : number of residues in all indels --- LSEQ2 : length of aligned sequence --- ACCNUM : SwissProt accession number --- NAME : one-line description of aligned protein --- --- MAXHOM ALIGNMENT HEADER: SUMMARY ID STRID IDE WSIM LALI NGAP LGAP LEN2 ACCNUM NAME opd_flasp 1PTA 100 100 365 0 0 365 P16648 PARATHION HYDROLASE PRECU yhfv_ecoli 30 45 288 6 32 292 P45548 HYPOTHETICAL 32.9 KD PROT --- --- MAXHOM ALIGNMENT: IN MSF FORMAT MSF of: /home/phd/server/work/predict_h28028_601.hssp from: 1 to: 365 /home/phd/server/work/predict_h28028_601.ret_msf MSF: 365 Type: P 7-May-97 12:00:0 Check: 7048 .. Name: predict_h280 Len: 365 Check: 1961 Weight: 1.00 Name: opd_flasp Len: 365 Check: 1961 Weight: 1.00 Name: yhfv_ecoli Len: 365 Check: 3126 Weight: 1.00 // 1 50 predict_h280 MQTRRVVLKS AAAAGTLLGG LAGCASVAGS IGTGDRINTV RGPITISEAG opd_flasp MQTRRVVLKS AAAAGTLLGG LAGCASVAGS IGTGDRINTV RGPITISEAG yhfv_ecoli .......... .......... .......... .......... ...MSFDPTG 51 100 predict_h280 FTLTHEHICG SSAGFLRAWP EFFGSRKALA EKAVRGLRRA RAAGVRTIVD opd_flasp FTLTHEHICG SSAGFLRAWP EFFGSRKALA EKAVRGLRRA RAAGVRTIVD yhfv_ecoli YTLAHEHLHI DLSGFKNNVD CRLDQYAFIC QEMNDLMTR. ...GVRNVIE 101 150 predict_h280 VSTFDIGRDV SLLAEVSRAA DVHIVAATGL WFDPPLSMRL RSVEELTQFF opd_flasp VSTFDIGRDV SLLAEVSRAA DVHIVAATGL WFDPPLSMRL RSVEELTQFF yhfv_ecoli MTNRYMGRNA QFMLDVMRET GINVVACTGY YQDAFfhVAT RSVQELAQEM 151 200 predict_h280 LREIQYGIED TGIRAGIIKV ATTGKATPFQ ELVLKAAARA SLATGVPVTT opd_flasp LREIQYGIED TGIRAGIIKV ATTGKATPFQ ELVLKAAARA SLATGVPVTT yhfv_ecoli VDEIEQGIDG TELKAGIIAE IGtgKITPLE EKVFIAAALA HNQTGRPIST 201 250 predict_h280 HTAASQRDGE QQAAIFESEG LSPSRVCIGH SDDTDDLSYL TALAARGYLI opd_flasp HTAASQRDGE QQAAIFESEG LSPSRVCIGH SDDTDDLSYL TALAARGYLI yhfv_ecoli HTSFST.MGL EQLALLQAHG VDLSRVTVGH CDLKDNLDNI LKMIDLGAYV 251 300 predict_h280 GLDHIPHSAI GLEDNASASA LLGIRSWQTR ALLIKALIDQ GYMKQILVSN opd_flasp GLDHIPHSAI GLEDNASASA LLGIRSWQTR ALLIKALIDQ GYMKQILVSN yhfv_ecoli QFDTIGKNSY YPDEK..... .........R IAMLHALRDR GLLNRVMLSM 301 350 predict_h280 DWLFGFSSYV TNIMDVMDRV NPDGMAFIPL RVIPFLREKG VPQETLAGIT opd_flasp DWLFGFSSYV TNIMDVMDRV NPDGMAFIPL RVIPFLREKG VPQETLAGIT yhfv_ecoli DITRR..... ....SHLKAN GGYGYDYLLT TFIPQLRQSG FSQADVDVML 351 365 predict_h280 VTNPARFLSP TLRAS opd_flasp VTNPARFLSP TLRAS yhfv_ecoli RENPSQFFQ. ..... Note: Your protein has a homolologue of known structure in PDB! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The PHD prediction is clearly inferior to a prediction by homology, if a sequence with known tertiary structure exists in PDB. For the sequence you sent, there is a known homologue in PDB. We append the alignment of your sequence to some sequences, among them the PDB entry. Predicting 3D structure for your sequence is a straightforward task by using e.g. a program like WHATIF (for further information use WEB http://www.sander.embl-heidelberg.de/whatif/ or contact Gerrit Vriend --> VRIEND@EMBL-HEIDELBERG.DE). Should you have sent a known sequence to evaluate the PHD prediction, please mind that the performance of the method is expected to be better for proteins used for training the networks. The list of proteins used for training is (four letter PDB identifier + chain): 256b_A, 2aat , 8abp , 6acn , 1acx , 8adh , 3ait , 1ak3_A, 2alp , 9api_A, 9api_B, 8atc_A, 8atc_B, 1azu , 3b5c , 1bbp_A, 1bds , 3blm , 1bmv_1, 1bmv_2, 4bp2 , 2cab , 7cat_A, 1cbh, 1cc5 , 2ccy_A, 1cd4 , 1cdt_A, 3cla , 3cln , 4cms , 4cpa_I, 6cpa , 6cpp , 4cpv , 1crn , 1cse_I, 6cts , 2cyp , 5cyt_R, 3dfr , 6dfr , 3ebx , 1eca , 5er2_E, 1etu , 1fc2_C, 1fc2_D, 1fdl_H, 1fdx , 1fkf , 2fnr , 2fxb , 1fxi_A, 4fxn , 3gap_A, 2gbp , 2gcr , 1gd1_O, 2gls_A, 2gn5 , 1gox , 1gp1_A, 4gr1, 1hds_B, 1hip , 6hir , 2hla_A, 3hla_B, 3hmg_A, 3hmg_B, 2hmz_A, 5hvp_A, 2i1b , 3icb , 7icd , 1il8_A, 9ins_B, 1l58 , 1lap, 2lbp , 5ldh , 2lh4 , 2lhb , 1lrd_3, 2ltn_A, 2ltn_B, 5lyz, 1mcp_L, 4mdh_A, 2mev_1, 2mev_3, 2mev_4, 2mhu , 1mrt , 2or1_L, 1ovo_A, 2pab_A, 1paz , 9pap , 2pcy , 4pfk , 3pgm , 2phh, 2pka_A, 2pka_B, 1pmb_A, 1ppt , 1prc_C, 1prc_H, 1prc_L, 1prc_M, 1pyp , 1r09_2, 1rbp , 1rhd , 4rhv_1, 4rhv_3, 4rhv_4, 1rnh, 3rnt , 7rsa , 2rsp_A, 2rus_A, 4rxn , 1s01 , 4sbv_A, 1sdh_A, 4sgb_I, 1sgt , 1sh1 , 2sns , 2sod_B, 2stv , 2taa_A, 2tbv_A, 2tgp_I, 1tgs_I, 3tim_A, 6tmn_E, 2tmv_P, 1tnf_A, 4ts1_A, 2tsc_A, 1ubq , 2utg_A, 9wga_A, 2wrp_R, 1wsy_A, 1wsy_B, 4xia_A For personal messages or questions to the PHD authors, send email to Predict-Help@EMBL-Heidelberg.DE Burkhard Rost EMBL, 69120 Heidelberg, Europe ________________________________________________________________________________ Prediction of: - secondary structure, by PHDsec - solvent accessibility, by PHDacc PHD: Profile fed neural network systems from HeiDelberg ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Author: Burkhard Rost EMBL, Heidelberg, FRG Meyerhofstrasse 1, 69 117 Heidelberg Internet: Predict-Help@EMBL-Heidelberg.DE All rights reserved. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Secondary structure prediction by PHDsec: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Author: Burkhard Rost EMBL, Heidelberg, FRG Meyerhofstrasse 1, 69 117 Heidelberg Internet: Rost@EMBL-Heidelberg.DE All rights reserved. About the network method ~~~~~~~~~~~~~~~~~~~~~~~ The network procedure is described in detail in: 1) Rost, Burkhard; Sander, Chris: Prediction of protein structure at better than 70% accuracy. J. Mol. Biol., 1993, 232, 584-599. A brief description is given in: Rost, Burkhard; Sander, Chris: Improved prediction of protein secondary structure by use of se- quence profiles and neural networks. Proc. Natl. Acad. Sci. U.S.A., 1993, 90, 7558-7562. The PHD mail server is described in: 2) Rost, Burkhard; Sander, Chris; Schneider, Reinhard: PHD - an automatic mail server for protein secondary structure prediction. CABIOS, 1994, 10, 53-60. The latest improvement steps (up to 72%) are explained in: 3) Rost, Burkhard; Sander, Chris: Combining evolutionary information and neural networks to predict protein secondary structure. Proteins, 1994, 19, 55-72. To be quoted for publications of PHD output: Papers 1-3 for the prediction of secondary structure and the pre- diction server. About the input to the network ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The prediction is performed by a system of neural networks. The input is a multiple sequence alignment. It is taken from an HSSP file (produced by the program MaxHom: Sander, Chris & Schneider, Reinhard: Database of Homology-Derived Structures and the Structural Meaning of Sequence Alignment. Proteins, 1991, 9, 56-68. For optimal results the alignment should contain sequences with varying degrees of sequence similarity relative to the input protein. The following is an ideal situation: +-----------------+----------------------+ | sequence: | sequence identity | +-----------------+----------------------+ | target sequence | 100 % | | aligned seq. 1 | 90 % | | aligned seq. 2 | 80 % | | ... | ... | | aligned seq. 7 | 30 % | +-----------------+----------------------+ Estimated Accuracy of Prediction ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A careful cross validation test on some 250 protein chains (in total about 55,000 residues) with less than 25% pairwise sequence identity gave the following results: ++================++-----------------------------------------+ || Qtotal = 72.1% || ("overall three state accuracy") | ++================++-----------------------------------------+ +----------------------------+-----------------------------+ | Qhelix (% of observed)=70% | Qhelix (% of predicted)=77% | | Qstrand(% of observed)=62% | Qstrand(% of predicted)=64% | | Qloop (% of observed)=79% | Qloop (% of predicted)=72% | +----------------------------+-----------------------------+ .......................................................................... These percentages are defined by: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | number of correctly predicted residues |Qtotal = --------------------------------------- (*100) | number of all residues | | no of res correctly predicted to be in helix |Qhelix (% of obs) = -------------------------------------------- (*100) | no of all res observed to be in helix | | | no of res correctly predicted to be in helix |Qhelix (% of pred)= -------------------------------------------- (*100) | no of all residues predicted to be in helix .......................................................................... Averaging over single chains ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The most reasonable way to compute the overall accuracies is the above quoted percentage of correctly predicted residues. However, since the user is mainly interested in the expected performance of the prediction for a particular protein, the mean value when averaging over protein chains might be of help as well. Computing first the three state accuracy for each protein chain, and then averaging over 250 chains yields the following average: +-------------------------------====--+ | Qtotal/averaged over chains = 72.2% | +-------------------------------====--+ | standard deviation = 9.3% | +-------------------------------------+ .......................................................................... Further measures of performance ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Matthews correlation coefficient: +---------------------------------------------+ | Chelix = 0.63, Cstrand = 0.53, Cloop = 0.52 | +---------------------------------------------+ .......................................................................... Average length of predicted secondary structure segments: . +------------+----------+ . | predicted | observed | +-----------+------------+----------+ | Lhelix = | 10.3 | 9.3 | | Lstrand = | 5.0 | 5.3 | | Lloop = | 7.2 | 5.9 | +-----------+------------+----------+ .......................................................................... The accuracy matrix in detail: +---------------------------------------+ | number of residues with H, E, L | +---------+------+------+------+--------+ | |net H |net E |net L |sum obs | +---------+------+------+------+--------+ | obs H |12447 | 1255 | 3990 | 17692 | | obs E | 949 | 7493 | 3750 | 12192 | | obs L | 2604 | 2875 |19962 | 25441 | +---------+------+------+------+--------+ | sum Net |16000 |11623 |27702 | 55325 | +---------+------+------+------+--------+ Note: This table is to be read in the following manner: 12447 of all residues predicted to be in helix, were observed to be in helix, 949 however belong to observed strands, 2604 to observed loop regions. The term "observed" refers to the DSSP assignment of secondary structure calculated from 3D coordinates of experimentally determined structures (Dictionary of Secondary Structure of Proteins: Kabsch & Sander (1983) Biopolymers, 22, 2577-2637). Position-specific reliability index ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The network predicts the three secondary structure types using real numbers from the output units. The prediction is assigned by choosing the maximal unit ("winner takes all"). However, the real numbers contain additional information. E.g. the difference between the maximal and the second largest output unit can be used to derive a "reliability index". This index is given for each residue along with the prediction. The index is scaled to have values between 0 (lowest reliability), and 9 (highest). The accuracies (Qtot) to be expected for residues with values above a particular value of the index are given below as well as the fraction of such residues (%res).: +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | index| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | %res |100.0| 99.2| 90.4| 80.9| 71.6| 62.5| 52.8| 42.3| 29.8| 14.1| +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | | | | | | | | | | | | | Qtot | 72.1| 72.3| 74.8| 77.7| 80.3| 82.9| 85.7| 88.5| 91.1| 94.2| | | | | | | | | | | | | +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | H%obs| 70.4| 70.6| 73.7| 77.1| 80.1| 83.1| 86.0| 89.3| 92.5| 96.4| | E%obs| 61.5| 61.7| 63.7| 66.6| 69.1| 71.7| 74.6| 77.0| 77.8| 68.1| | | | | | | | | | | | | | H%prd| 77.8| 78.0| 80.0| 82.6| 84.7| 86.9| 89.2| 91.3| 93.1| 95.4| | E%prd| 64.5| 64.7| 67.8| 71.0| 74.2| 77.6| 81.4| 85.1| 89.8| 93.5| +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ The above table gives the cumulative results, e.g. 62.5% of all residues have a reliability of at least 5. The overall three-state accuracy for this subset of almost two thirds of all residues is 82.9%. For this subset, e.g., 83.1% of the observed helices are correctly predicted, and 86.9% of all residues predicted to be in helix are correct. .......................................................................... The following table gives the non-cumulative quantities, i.e. the values per reliability index range. These numbers answer the question: how reliable is the prediction for all residues labeled with the particular index i. +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | index| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | %res | 8.8| 9.5| 9.3| 9.1| 9.7| 10.5| 12.5| 15.7| 14.1| +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | | | | | | | | | | | | Qtot | 46.6| 50.6| 57.7| 62.6| 67.9| 74.2| 82.2| 88.3| 94.2| | | | | | | | | | | | +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | H%obs| 36.8| 42.3| 49.5| 55.2| 61.7| 69.9| 78.8| 87.4| 96.4| | E%obs| 44.7| 44.5| 52.1| 55.4| 60.9| 68.0| 75.9| 81.0| 68.1| | | | | | | | | | | | | H%prd| 49.9| 52.5| 60.3| 64.2| 69.2| 77.5| 85.4| 89.9| 95.4| | E%prd| 41.7| 47.1| 53.6| 57.0| 64.0| 71.6| 78.8| 88.8| 93.5| +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ For example, for residues with Relindex = 5 64% of all predicted betha- strand residues are correctly identified. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Solvent accessibility prediction by PHDacc: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Author: Burkhard Rost EMBL, Heidelberg, FRG Meyerhofstrasse 1, 69 117 Heidelberg Internet: Rost@EMBL-Heidelberg.DE All rights reserved. About the network method ~~~~~~~~~~~~~~~~~~~~~~~ The network for prediction of secondary structure is described in detail in: Rost, Burkhard; Sander, Chris: Prediction of protein structure at better than 70% accuracy. J. Mol. Biol., 1993, 232, 584-599. The analysis of the prediction of solvent exposure is given in: Rost, Burkhard; Sander, Chris: Conservation and prediction of solvent accessibility in protein families. Proteins, 1994, 20, 216-226. To be quoted for publications of PHD exposure prediction: Both papers quoted above. Definition of accessibility ~~~~~~~~~~~~~~~~~~~~~~~~~~ For training the residue solvent accessibility the DSSP (Dictionary of Secondary Structure of Proteins; Kabsch & Sander (1983) Biopolymers, 22, 2577-2637) values of accessible surface area have been used. The prediction provides values for the relative solvent accessibility. The normalisation is the following: | ACCESSIBILITY (from DSSP in Angstrom) |RELATIVE_ACCESSIBILITY = ------------------------------------- * 100 | MAXIMAL_ACC (amino acid type i) where MAXIMAL_ACC (i) is the maximal accessibility of amino acid type i. The maximal values are: +----+----+----+----+----+----+----+----+----+----+----+----+ | A | B | C | D | E | F | G | H | I | K | L | M | | 106| 160| 135| 163| 194| 197| 84| 184| 169| 205| 164| 188| +----+----+----+----+----+----+----+----+----+----+----+----+ | N | P | Q | R | S | T | V | W | X | Y | Z | | 157| 136| 198| 248| 130| 142| 142| 227| 180| 222| 196| +----+----+----+----+----+----+----+----+----+----+----+ Notation: one letter code for amino acid, B stands for D or N; Z stands for E or Q; and X stands for undetermined. The relative solvent accessibility can be used to estimate the number of water molecules (W) in contact with the residue: W = ACCESSIBILITY /10 The prediction is given in 10 states for relative accessibility, with RELATIVE_ACCESSIBILITY = (PREDICTED_ACC * PREDICTED_ACC) where PREDICTED_ACC = 0 - 9. Estimated Accuracy of Prediction ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A careful cross validation test on some 238 protein chains (in total about 62,000 residues) with less than 25% pairwise sequence identity gave the following results: Correlation ........... The correlation between observed and predicted solvent accessibility is: ----------- corr = 0.53 ----------- This value ought to be compared to the worst and best case prediction scenario: random prediction (corr = 0.0) and homology modelling (corr = 0.66). (Note: homology modelling yields a relative accurate prediction in 3D if, and only if, a significantly identical sequence has a known 3D structure.) 3-state accuracy ................ Often the relative accessibility is projected onto, e.g., 3 states: b = buried (here defined as < 9% relative accessibility), i = intermediate ( 9% <= rel. acc. < 36% ), e = exposed ( rel. acc. >= 36% ). A projection onto 3 states or 2 states (buried/exposed) enables the compilation of a 3- and 2-state prediction accuracy. PHD reaches an overall 3-state accuracy of: Q3 = 57.5% (compared to 35% for random prediction and 70% for homology modelling). In detail: +-----------------------------------+-------------------------+ | Qburied (% of observed)=77% | Qb (% of predicted)=60% | | Qintermediate (% of observed)= 9% | Qi (% of predicted)=44% | | Qexposed (% of observed)=78% | Qe (% of predicted)=56% | +-----------------------------------+-------------------------+ 10-state accuracy ................. The network predicts relative solvent accessibility in 10 states, with state i (i = 0-9) corresponding to a relative solvent accessibility of i*i %. The 10-state accuracy of the network is: Q10 = 24.5% .......................................................................... These percentages are defined by: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | number of correctly predicted residues |Q3 = --------------------------------------- (*100) | number of all residues | | no of res. correctly predicted to be buried |Qburied (% of obs) = ------------------------------------------- (*100) | no of all res. observed to be buried | | | no of res. correctly predicted to be buried |Qburied (% of pred)= ------------------------------------------- (*100) | no of all residues predicted to be buried .......................................................................... Averaging over single chains ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The most reasonable way to compute the overall accuracies is the above quoted percentage of correctly predicted residues. However, since the user is mainly interested in the expected performance of the prediction for a particular protein, the mean value when averaging over protein chains might be of help as well. Computing first the correlation between observed and predicted accessibility for each protein chan, and then averaging over all 238 chains yields the following average: +-------------------------------====--+ | corr/averaged over chains = 0.53 | +-------------------------------====--+ | standard deviation = 0.11 | +-------------------------------------+ .......................................................................... Further details of performance accuracy ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The accuracy matrix in detail: .............................. -------+----------------------------------------------------+----------- \ PHD | 0 1 2 3 4 5 6 7 8 9 | SUM %obs -------+----------------------------------------------------+----------- OBS 0 | 8611 140 8 44 82 169 772 334 27 0 | 10187 16.6 OBS 1 | 4367 164 0 50 106 231 738 346 44 3 | 6049 9.8 OBS 2 | 3194 168 1 68 125 303 951 513 42 7 | 5372 8.7 OBS 3 | 2760 159 8 80 136 327 1246 746 58 19 | 5539 9.0 OBS 4 | 2312 144 2 72 166 396 1615 1245 124 19 | 6095 9.9 OBS 5 | 1873 96 3 84 138 425 1979 1834 187 27 | 6646 10.8 OBS 6 | 1387 67 1 60 80 278 2237 2627 231 51 | 7019 11.4 OBS 7 | 1082 35 0 32 56 225 1871 3107 302 60 | 6770 11.0 OBS 8 | 660 25 0 27 43 136 1206 2374 325 87 | 4883 7.9 OBS 9 | 325 20 2 27 29 74 648 1159 366 214 | 2864 4.7 -------+----------------------------------------------------+----------- SUM |26571 1018 25 544 961 2564 13263 14285 1706 487 | %pred | 43.3 1.7 0.0 0.9 1.6 4.2 21.6 23.3 2.8 0.8 | -------+----------------------------------------------------+----------- Note: This table is to be read in the following manner: 8611 of all residues predicted to be in exposed by 0%, were observed with 0% relative accessibility. However, 325 of all residues predicted to have 0% are observed as completely exposed (obs = 9 -> rel. acc. >= 81%). The term "observed" refers to the DSSP compilation of area of solvent accessibility calculated from 3D coordinates of experimentally determined structures (Diction- ary of Secondary Structure of Proteins: Kabsch & Sander (1983) Biopolymers, 22, 2577-2637). Accuracy for each amino acid: ............................. +---+------------------------------+-----+-------+------+ |AA | Q3 b%o b%p i%o i%p e%o e%p | Q10 | corr | N | +---+------------------------------+-----+-------+------+ | A | 59.0 87 60 2 38 66 57 | 31 | 0.530 | 5054 | | C | 62.0 91 67 5 39 25 21 | 34 | 0.244 | 893 | | D | 56.5 21 45 6 49 94 57 | 20 | 0.321 | 3536 | | E | 60.8 9 40 3 41 98 61 | 21 | 0.347 | 3743 | | F | 63.3 94 67 9 46 29 37 | 27 | 0.366 | 2436 | | G | 52.1 75 51 1 31 67 53 | 22 | 0.405 | 4787 | | H | 50.9 63 53 23 45 71 50 | 18 | 0.442 | 1366 | | I | 64.9 95 68 6 41 30 38 | 34 | 0.360 | 3437 | | K | 66.6 2 11 2 37 98 67 | 23 | 0.267 | 3652 | | L | 61.6 93 65 8 44 31 40 | 31 | 0.368 | 5016 | | M | 60.1 92 64 5 39 45 44 | 29 | 0.452 | 1371 | | N | 55.5 45 45 8 38 87 59 | 17 | 0.410 | 2923 | | P | 53.0 48 48 9 39 83 56 | 18 | 0.364 | 2920 | | Q | 54.3 27 44 7 44 92 56 | 20 | 0.344 | 2225 | | R | 49.9 15 47 36 47 76 51 | 18 | 0.372 | 2765 | | S | 55.6 69 53 3 51 81 56 | 22 | 0.464 | 3981 | | T | 51.8 61 51 8 38 78 53 | 21 | 0.432 | 3740 | | V | 61.1 93 65 5 40 39 42 | 34 | 0.418 | 4156 | | W | 56.2 85 62 20 49 29 27 | 21 | 0.318 | 891 | | Y | 49.7 73 52 33 49 36 38 | 19 | 0.359 | 2301 | +---+------------------------------+-----+-------+------+ Abbreviations: AA: amino acid in one-letter code b%o, i%o, e%o: = Qburied, Qintermediate, Qexposed (% of observed), i.e. percentage of correct prediction in each state, see above b%p, i%p, e%p: = Qburied, Qintermediate, Qexposed (% of predicted), i.e. probability of correct prediction in each state, see above b%o: = Qburied (% of observed), see above Q10: percentage of correctly predicted residues in each of the 10 states of predicted relative accessibility. corr: correlation between predicted and observed rel. acc. N: number of residues in data set Accuracy for different secondary structure: ........................................... +--------+------------------------------+----+-------+-------+ | type | Q3 b%o b%p i%o i%p e%o e%p |Q10 | corr | N | +--------+------------------------------+----+-------+-------+ | helix | 59.5 79 64 8 44 80 56 | 27 | 0.574 | 20100 | | strand | 61.3 84 73 9 46 69 37 | 35 | 0.524 | 13356 | | loop | 54.4 64 43 11 44 78 61 | 18 | 0.442 | 27968 | +--------+------------------------------+----+-------+-------+ Abbreviations as before. Position-specific reliability index ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The network predicts the 10 states for relative accessibility using real numbers from the output units. The prediction is assigned by choosing the maximal unit ("winner takes all"). However, the real numbers contain additional information. E.g. the difference between the maximal and the second largest output unit (with the constraint that the second largest output is compiled among all units at least 2 positions off the maximal unit) can be used to derive a "reliability index". This index is given for each residue along with the prediction. The index is scaled to have values between 0 (lowest reliability), and 9 (highest). The accuracies (Q3, corr, asf.) to be expected for residues with values above a particular value of the index are given below as well as the fraction of such residues (%res).: +---+------------------------------+----+-------+-------+ |RI | Q3 b%o b%p i%o i%p e%o e%p |Q10 | corr | %res | +---+------------------------------+----+-------+-------+ | 0 | 57.5 77 60 9 44 78 56 | 24 | 0.535 | 100.0 | | 1 | 59.1 76 63 9 45 82 57 | 25 | 0.560 | 91.2 | | 2 | 61.7 79 66 4 47 87 58 | 27 | 0.594 | 77.1 | | 3 | 66.6 87 70 1 51 89 63 | 30 | 0.650 | 57.1 | | 4 | 70.0 89 72 0 83 91 67 | 32 | 0.686 | 45.8 | | 5 | 72.9 92 75 0 0 93 70 | 34 | 0.722 | 35.6 | | 6 | 76.3 95 77 0 0 93 75 | 36 | 0.769 | 24.7 | | 7 | 79.0 97 79 0 0 93 78 | 39 | 0.803 | 16.0 | | 8 | 80.9 98 80 0 0 91 81 | 43 | 0.824 | 9.6 | | 9 | 81.2 99 80 0 0 88 83 | 45 | 0.828 | 5.9 | +---+------------------------------+----+-------+-------+ Abbreviations as before. The above table gives the cumulative results, e.g. 45.8% of all residues have a reliability of at least 4. The correlation for this most reliably predicted half of the residues is 0.686, i.e. a value comparable to what could be expected if homology modelling were possible. For this subset of 45.8% of all residues, 89% of the buried residues are correctly predicted, and 72% of all residues predicted to be buried are correct. .......................................................................... The following table gives the non-cumulative quantities, i.e. the values per reliability index range. These numbers answer the question: how reliable is the prediction for all residues labeled with the particular index i. +---+------------------------------+----+-------+-------+ |RI | Q3 b%o b%p i%o i%p e%o e%p |Q10 | corr | %res | +---+------------------------------+----+-------+-------+ | 0 | 40.9 79 40 16 41 21 40 | 14 | 0.175 | 8.8 | | 1 | 45.4 61 46 28 44 48 44 | 17 | 0.278 | 14.1 | | 2 | 47.4 53 52 10 46 80 44 | 19 | 0.343 | 19.9 | | 3 | 52.9 75 59 4 50 77 47 | 23 | 0.439 | 11.4 | | 4 | 60.0 81 63 0 83 84 56 | 25 | 0.547 | 10.1 | | 5 | 65.2 82 70 0 0 93 62 | 28 | 0.607 | 10.9 | | 6 | 71.3 90 72 0 0 94 70 | 31 | 0.692 | 8.8 | | 7 | 76.0 94 76 0 0 95 75 | 34 | 0.762 | 6.3 | | 8 | 80.5 97 81 0 0 94 79 | 39 | 0.808 | 3.8 | | 9 | 81.2 99 80 0 0 88 83 | 45 | 0.828 | 5.9 | +---+------------------------------+----+-------+-------+ For example, for residues with RI = 4 83% of all predicted intermediate residues are correctly predicted as such. The resulting network (PHD) prediction is: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ________________________________________________________________________________ PHD: Profile fed neural network systems from HeiDelberg ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Prediction of: secondary structure, by PHDsec solvent accessibility, by PHDacc and helical transmembrane regions, by PHDhtm Author: Burkhard Rost EMBL, 69012 Heidelberg, Germany Internet: Rost@EMBL-Heidelberg.DE All rights reserved. The network systems are described in: PHDsec: B Rost & C Sander: JMB, 1993, 232, 584-599. B Rost & C Sander: Proteins, 1994, 19, 55-72. PHDacc: B Rost & C Sander: Proteins, 1994, 20, 216-226. PHDhtm: B Rost et al.: Prot. Science, 1995, 4, 521-533. Some statistics ~~~~~~~~~~~~~~ Percentage of amino acids: +--------------+--------+--------+--------+--------+--------+ | AA: | A | L | G | S | R | | % of AA: | 12.6 | 10.1 | 9.0 | 7.7 | 7.7 | +--------------+--------+--------+--------+--------+--------+ | AA: | T | I | V | D | E | | % of AA: | 7.4 | 7.1 | 6.8 | 5.2 | 4.4 | +--------------+--------+--------+--------+--------+--------+ | AA: | F | P | Q | K | H | | % of AA: | 4.1 | 3.8 | 3.0 | 2.5 | 1.9 | +--------------+--------+--------+--------+--------+--------+ | AA: | N | M | Y | W | C | | % of AA: | 1.6 | 1.6 | 1.4 | 1.1 | 0.8 | +--------------+--------+--------+--------+--------+--------+ Percentage of secondary structure predicted: +--------------+--------+--------+--------+ | SecStr: | H | E | L | | % Predicted: | 52.9 | 15.6 | 31.5 | +--------------+--------+--------+--------+ According to the following classes: all-alpha: %H>45 and %E< 5; all-beta : %H<5 and %E>45 alpha-beta : %H>30 and %E>20; mixed: rest, this means that the predicted class is: mixed class PHD output for your protein ~~~~~~~~~~~~~~~~~~~~~~~~~~ Wed May 7 12:01:49 1997 Jury on: 10 different architectures (version 5.94_317 ). Note: differently trained architectures, i.e., different versions can result in different predictions. About the protein ~~~~~~~~~~~~~~~~ HEADER /home/phd/server/work/predict_h28028_601 COMPND SOURCE AUTHOR SEQLENGTH 365 NCHAIN 1 chain(s) in predict_h28028_601 data set NALIGN 2 (=number of aligned sequences in HSSP file) WARNING ~~~~~~ Expected accuracy is about 72% if, and only if, the alignment contain sufficient information. For your sequence there were not many homologues in the current version of Swissprot detected. This implies that the expected accuracy is some percentage points lower ! Abbreviations: PHDsec ~~~~~~~~~~~~~~~~~~~~ sequence: AA : amino acid sequence secondary structure: HEL: H=helix, E=extended (sheet), blank=other (loop) PHD: Profile network prediction HeiDelberg Rel: Reliability index of prediction (0-9) detail: prH: 'probability' for assigning helix prE: 'probability' for assigning strand prL: 'probability' for assigning loop note: the 'probabilites' are scaled to the interval 0-9, e.g., prH=5 means, that the first output node is 0.5-0.6 subset: SUB: a subset of the prediction, for all residues with an expected average accuracy > 82% (tables in header) note: for this subset the following symbols are used: L: is loop (for which above " " is used) ".": means that no prediction is made for this residue, as the reliability is: Rel < 5 Abbreviations: PHDacc ~~~~~~~~~~~~~~~~~~~~ SS : secondary structure HEL: H=helix, E=extended (sheet), blank=other (loop) solvent accessibility: 3st: relative solvent accessibility (acc) in 3 states: b = 0-9%, i = 9-36%, e = 36-100%. PHD: Profile network prediction HeiDelberg Rel: Reliability index of prediction (0-9) O_3: observed relative acc. in 3 states: B, I, E note: for convenience a blank is used intermediate (i). P_3: predicted relative accessibility in 3 states 10st:relative accessibility in 10 states: = n corresponds to a relative acc. of n*n % subset: SUB: a subset of the prediction, for all residues with an expected average correlation > 0.69 (tables in header) note: for this subset the following symbols are used: "I": is intermediate (for which above " " is used) ".": means that no prediction is made for this residue, as the reliability is: Rel < 4 protein: predict length 365 ....,....1....,....2....,....3....,....4....,....5....,....6 AA |MQTRRVVLKSAAAAGTLLGGLAGCASVAGSIGTGDRINTVRGPITISEAGFTLTHEHICG| PHD sec | HHHHHHHHHHHHHHHHHHHHEEEEEE EE E EEE EE HHH | Rel sec |984226799999999997333441213233377531221128523246985231551132| detail: prH sec |001347889999999997665554332222100111100000000001000013764322| prE sec |002210000000000000000114444444211234554441256521012552001122| prL sec |986431100000000001333221122222577654334458643467987424224454| subset: SUB sec |LL...HHHHHHHHHHHHH.............LLL.......LL....LLLL...HH....| accessibility 3st: P_3 acc |eee ebbbebbbbbbbbbbbbbbbbbbbbbbe ee bbbbebebebeebbbbbbbebbbb| 10st: PHD acc |997560007000000000000000000000095975000060607067000000060000| Rel acc |642136664442353033101424634623021342200412103324144275514601| subset: SUB acc |ee...bbbebb..b.......b.bb.bb......e....b.......e.bb.bbb.bb..| ....,....7....,....8....,....9....,....10...,....11...,....12 AA |SSAGFLRAWPEFFGSRKALAEKAVRGLRRARAAGVRTIVDVSTFDIGRDVSLLAEVSRAA| PHD sec | HHHHHHHHHHHHHHHHHHHHHHHHHHHHH EEEEE HHHHHHHHHHHH | Rel sec |131213364897617999999999999999764553589833112462599999999972| detail: prH sec |254345676888758999999999999999876222000012443215799999999873| prE sec |311000000000000000000000000000000002788852110000000000000000| prL sec |424543323101241000000000000000123775210035445673200000000015| subset: SUB sec |.......H.HHHH.HHHHHHHHHHHHHHHHHH.LL.EEEE......L.HHHHHHHHHHH.| accessibility 3st: P_3 acc |ebebbeeebee bee ebbbeeb eebeebebebbbbbbebbe bb ebebbbebbeeb| 10st: PHD acc |607006760774077570007705760660607000000600655004606000600680| Rel acc |103131411530254131545530616214203470498242111231142674274242| subset: SUB acc |......e..e...ee...bbee..e.b..b...bb.bbb.b........b.bbb.bb.e.| ....,....13...,....14...,....15...,....16...,....17...,....18 AA |DVHIVAATGLWFDPPLSMRLRSVEELTQFFLREIQYGIEDTGIRAGIIKVATTGKATPFQ| PHD sec | EEEEEEEEEE EEEHHHHHHHHHHHHHHHHH HHHHHHHHHHH HHH| Rel sec |938999832210584113215699999999999997434422456565346469888379| detail: prH sec |000000000110002321246799999999999998332255667777567320111689| prE sec |038998855444210135430100000000000000000011210012331000000000| prL sec |961000133334786432222100000000000001666633111200001679888310| subset: SUB sec |L.EEEEE.....LL......HHHHHHHHHHHHHHHH.......HHHHH..H.LLLLL.HH| accessibility 3st: P_3 acc |bbbbbbbbb bbbbebebeb bbeebbeebbeebeebbeebeb bbbbebbeeeebbbbe| 10st: PHD acc |000000000500006060604006600760066067007606050000600777700006| Rel acc |141999846100000012111572293424431824455121318599201364410031| subset: SUB acc |.b.bbbbbb............bb..b.e.bb..b.ebbe.....bbbb....eee.....| ....,....19...,....20...,....21...,....22...,....23...,....24 AA |ELVLKAAARASLATGVPVTTHTAASQRDGEQQAAIFESEGLSPSRVCIGHSDDTDDLSYL| PHD sec |HHHHHHHHHHHHH EEE HHHHHHHHHHHHHHHH EEEEE HHHHHHH| Rel sec |999999999999279841424624898899999999992699762798737986999999| detail: prH sec |999999999999510000000036888899999999995100000000000017899999| prE sec |000000000000000135653100000000000000000000115898731000000000| prL sec |000000000000489864246753101100000000004799874100168882000000| subset: SUB sec |HHHHHHHHHHHH.LLL.....L..HHHHHHHHHHHHHH.LLLLL.EEEE.LLLHHHHHHH| accessibility 3st: P_3 acc |eebbebbbebeeebb bbbbbbbbbeeebeebbbbbebeb eeb bbbbbb beeebeeb| 10st: PHD acc |660060006067700500000000077707600000607037604000000507760760| Rel acc |119829992813412126023211445423246748214215171849411105314616| subset: SUB acc |..bb.bbb.b..e....b......beee...bbbbb..e..e.b.bbbb....e..be.b| ....,....25...,....26...,....27...,....28...,....29...,....30 AA |TALAARGYLIGLDHIPHSAIGLEDNASASALLGIRSWQTRALLIKALIDQGYMKQILVSN| PHD sec |HHHHH EEEEE HHHHHHH HHHHHHHHHHHHHHH EEEEEE| Rel sec |999962883799217778758772111589982128999999999995342211378976| detail: prH sec |999974100000000011100014455788984558999999999986323333310001| prE sec |000000005888541100021100000000000000000000000000000122578877| prL sec |000025883100457888868785544210015441000000000002565443100011| subset: SUB sec |HHHHH.LL.EEE..LLLLLLLLL....HHHHH...HHHHHHHHHHHHH.......EEEEE| accessibility 3st: P_3 acc |bebbb bbbbbbe beeebeebeeebebebbbebeb ebbbbbbebb beb beebbbbb| 10st: PHD acc |070005000000650677076089707070008060570000006005060406600000| Rel acc |337201531636105153122044604244323220142237972881013152278670| subset: SUB acc |..b...b..b.b..b.e.....eee.e.eb.......e...bbb.bb.....b..bbbb.| ....,....31...,....32...,....33...,....34...,....35...,....36 AA |DWLFGFSSYVTNIMDVMDRVNPDGMAFIPLRVIPFLREKGVPQETLAGITVTNPARFLSP| PHD sec |EHHH HHHHHHHHHH EEEEEHHHHHHHH HHHHHHHHHH HHHH | Rel sec |211213512799999999615898534653531799977898466898794585683122| detail: prH sec |234543244899999999742100011012653899981100677888786702786544| prE sec |431100000000000000000000256764111000000000000000100000000000| prL sec |233345645100000000257898622112134100018898322000002287213455| subset: SUB sec |......L..HHHHHHHHHH.LLLLL..EE.H..HHHHHLLLL.HHHHHHH.HLLHH....| accessibility 3st: P_3 acc |ebb ebee bbbbbebbeeeeeebbbbbbbebbbeb eeb eeeebeebbeebbeebbee| 10st: PHD acc |600570775000007007876970000000600060477057676076007700760066| Rel acc |113141431601624275421544003522256227146014121341403511326421| subset: SUB acc |....e.e..b..b.e.bee..eeb...b...bb..b.ee..e....e.b..e....bb..| ....,....37...,....38...,....39...,....40...,....41...,....42 AA |TLRAS| PHD sec | | Rel sec |12689| detail: prH sec |43100| prE sec |00100| prL sec |45789| subset: SUB sec |..LLL| accessibility 3st: P_3 acc |ebeee| 10st: PHD acc |60699| Rel acc |11236| subset: SUB acc |....e| ________________________________________________________________________________ Since you did set the keyword "return col" in the header, here a file will be appended that can be used as input for the prediction- based threading. The PHD predictions in COLUMN format: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ________________________________________________________________________________ --- --- PHD PREDICTION COLUMN FORMAT HEADER: ABBREVIATIONS --- AA : one-letter code for amino acid sequence --- PSEC : secondary structure prediction in 3 states: --- : H=helix, E=extended (sheet), L=rest (loop) --- RI_S : reliability of secondary structure prediction --- : scaled from 0 (low) to 9 (high) --- pH : 'probability' for assigning helix --- pE : 'probability' for assigning strand --- pL : 'probability' for assigning rest --- : Note: the 'probabilities' are scaled onto 0-9, --- : i.e., prH=5 means that the value of the --- : first output unit is 0.5-0.6 --- PACC : predicted solvent accessibility in square Angstrom --- PREL : relative solvent accessibility in percent --- RI_A : reliability of accessibility prediction (0-9) --- Pbie : predicted relative accessibility in 3 states: --- : b=0-9%, i=9-36%, e=36-100% --- --- PHD PREDICTION COLUMN FORMAT No AA PSEC RI_S pH pE pL PACC PREL RI_A Pbie 1 M L 9 0 0 9 152 81 6 e 2 Q L 8 0 0 8 160 81 4 e 3 T L 4 1 1 6 69 49 2 e 4 R L 2 3 2 4 62 25 1 i 5 R H 2 4 1 3 89 36 3 e 6 V H 6 7 0 1 0 0 6 b 7 V H 7 8 0 1 0 0 6 b 8 L H 9 8 0 0 0 0 6 b 9 K H 9 9 0 0 100 49 4 e 10 S H 9 9 0 0 0 0 4 b 11 A H 9 9 0 0 0 0 4 b 12 A H 9 9 0 0 0 0 2 b 13 A H 9 9 0 0 0 0 3 b 14 A H 9 9 0 0 0 0 5 b 15 G H 9 9 0 0 0 0 3 b 16 T H 9 9 0 0 0 0 0 b 17 L H 9 9 0 0 0 0 3 b 18 L H 7 8 0 1 0 0 3 b 19 G H 3 6 0 3 0 0 1 b 20 G H 3 6 0 3 0 0 0 b 21 L H 3 5 0 3 0 0 1 b 22 A H 4 5 1 2 0 0 4 b 23 G H 4 5 1 2 0 0 2 b 24 C H 1 4 4 1 0 0 4 b 25 A E 2 3 4 1 0 0 6 b 26 S E 1 3 4 1 0 0 3 b 27 V E 3 2 4 2 0 0 4 b 28 A E 2 2 4 2 0 0 6 b 29 G E 3 2 5 2 0 0 2 b 30 S E 3 2 4 2 0 0 3 b 31 I L 3 1 2 5 0 0 0 b 32 G L 7 0 1 7 68 81 2 e 33 T L 7 0 1 7 35 25 1 i 34 G L 5 1 2 6 68 81 3 e 35 D L 3 1 3 5 79 49 4 e 36 R L 1 1 4 4 62 25 2 i 37 I E 2 1 5 3 0 0 2 b 38 N E 2 0 5 3 0 0 0 b 39 T L 1 0 4 4 0 0 0 b 40 V E 1 0 4 4 0 0 4 b 41 R L 2 0 4 5 89 36 1 e 42 G L 8 0 1 8 0 0 2 b 43 P L 5 0 2 7 48 36 1 e 44 I E 2 0 5 4 0 0 0 b 45 T E 3 0 6 3 69 49 3 e 46 I E 2 0 5 4 0 0 3 b 47 S L 4 0 2 6 46 36 2 e 48 E L 6 1 1 7 95 49 4 e 49 A L 9 0 0 9 0 0 1 b 50 G L 8 0 1 8 0 0 4 b 51 F L 5 0 2 7 0 0 4 b 52 T E 2 0 5 3 0 0 2 b 53 L E 3 1 6 2 0 0 7 b 54 T L 1 3 2 4 0 0 5 b 55 H H 5 7 0 2 0 0 5 b 56 E H 5 6 0 2 69 36 1 e 57 H H 1 4 1 4 0 0 4 b 58 I L 1 3 1 4 0 0 6 b 59 C L 3 2 2 5 0 0 0 b 60 G L 2 2 2 4 0 0 1 b 61 S L 1 2 3 4 46 36 1 e 62 S L 3 5 1 2 0 0 0 b 63 A L 1 4 1 4 51 49 3 e 64 G L 2 3 0 5 0 0 1 b 65 F H 1 5 0 4 0 0 3 b 66 L H 3 5 0 3 59 36 1 e 67 R H 3 6 0 3 121 49 4 e 68 A H 6 7 0 2 38 36 1 e 69 W H 4 6 0 3 0 0 1 b 70 P H 8 8 0 1 66 49 5 e 71 E H 9 8 0 0 95 49 3 e 72 F H 7 8 0 1 31 16 0 i 73 F H 6 7 0 2 0 0 2 b 74 G H 1 5 0 4 41 49 5 e 75 S H 7 8 0 1 63 49 4 e 76 R H 9 9 0 0 62 25 1 i 77 K H 9 9 0 0 100 49 3 e 78 A H 9 9 0 0 0 0 1 b 79 L H 9 9 0 0 0 0 5 b 80 A H 9 9 0 0 0 0 4 b 81 E H 9 9 0 0 95 49 5 e 82 K H 9 9 0 0 100 49 5 e 83 A H 9 9 0 0 0 0 3 b 84 V H 9 9 0 0 35 25 0 i 85 R H 9 9 0 0 121 49 6 e 86 G H 9 9 0 0 30 36 1 e 87 L H 9 9 0 0 0 0 6 b 88 R H 9 9 0 0 89 36 2 e 89 R H 9 9 0 0 89 36 1 e 90 A H 9 9 0 0 0 0 4 b 91 R H 7 8 0 1 89 36 2 e 92 A H 6 7 0 2 0 0 0 b 93 A H 4 6 0 3 51 49 3 e 94 G L 5 2 0 7 0 0 4 b 95 V L 5 2 0 7 0 0 7 b 96 R L 3 2 2 5 0 0 0 b 97 T E 5 0 7 2 0 0 4 b 98 I E 8 0 8 1 0 0 9 b 99 V E 9 0 9 0 0 0 8 b 100 D E 8 0 8 0 58 36 2 e 101 V E 3 1 5 3 0 0 4 b 102 S L 3 2 2 5 0 0 2 b 103 T L 1 4 1 4 51 36 1 e 104 F L 1 4 1 4 49 25 1 i 105 D L 2 3 0 5 40 25 1 i 106 I L 4 2 0 6 0 0 2 b 107 G L 6 1 0 7 0 0 3 b 108 R H 2 5 0 3 39 16 1 i 109 D H 5 7 0 2 58 36 1 e 110 V H 9 9 0 0 0 0 4 b 111 S H 9 9 0 0 46 36 2 e 112 L H 9 9 0 0 0 0 6 b 113 L H 9 9 0 0 0 0 7 b 114 A H 9 9 0 0 0 0 4 b 115 E H 9 9 0 0 69 36 2 e 116 V H 9 9 0 0 0 0 7 b 117 S H 9 9 0 0 0 0 4 b 118 R H 9 8 0 0 89 36 2 e 119 A H 7 7 0 1 67 64 4 e 120 A L 2 3 0 5 0 0 2 b 121 D L 9 0 0 9 0 0 1 b 122 V L 3 0 3 6 0 0 4 b 123 H E 8 0 8 1 0 0 1 b 124 I E 9 0 9 0 0 0 9 b 125 V E 9 0 9 0 0 0 9 b 126 A E 9 0 8 0 0 0 9 b 127 A E 8 0 8 1 0 0 8 b 128 T E 3 0 5 3 0 0 4 b 129 G E 2 0 5 3 0 0 6 b 130 L E 2 1 5 3 41 25 1 i 131 W E 1 1 4 4 0 0 0 b 132 F E 0 0 4 4 0 0 0 b 133 D L 5 0 2 7 0 0 0 b 134 P L 8 0 1 8 0 0 0 b 135 P L 4 2 0 6 48 36 0 e 136 L L 1 3 1 4 0 0 0 b 137 S E 1 2 3 3 46 36 1 e 138 M E 3 1 5 2 0 0 2 b 139 R E 2 2 5 2 89 36 1 e 140 L H 1 4 3 2 0 0 1 b 141 R H 5 6 0 2 39 16 1 i 142 S H 6 7 0 1 0 0 5 b 143 V H 9 9 0 0 0 0 7 b 144 E H 9 9 0 0 69 36 2 e 145 E H 9 9 0 0 69 36 2 e 146 L H 9 9 0 0 0 0 9 b 147 T H 9 9 0 0 0 0 3 b 148 Q H 9 9 0 0 97 49 4 e 149 F H 9 9 0 0 70 36 2 e 150 F H 9 9 0 0 0 0 4 b 151 L H 9 9 0 0 0 0 4 b 152 R H 9 9 0 0 89 36 3 e 153 E H 9 9 0 0 69 36 1 e 154 I H 9 9 0 0 0 0 8 b 155 Q H 9 9 0 0 71 36 2 e 156 Y H 7 8 0 1 108 49 4 e 157 G L 4 3 0 6 0 0 4 b 158 I L 3 3 0 6 0 0 5 b 159 E L 4 2 0 6 95 49 5 e 160 D L 4 2 0 6 58 36 1 e 161 T H 2 5 1 3 0 0 2 b 162 G H 2 5 1 3 30 36 1 e 163 I H 4 6 2 1 0 0 3 b 164 R H 5 6 1 1 62 25 1 i 165 A H 6 7 0 1 0 0 8 b 166 G H 5 7 0 2 0 0 5 b 167 I H 6 7 1 0 0 0 9 b 168 I H 5 7 2 0 0 0 9 b 169 K H 3 5 3 0 73 36 2 e 170 V H 4 6 3 0 0 0 0 b 171 A H 6 7 1 1 0 0 1 b 172 T L 4 3 0 6 69 49 3 e 173 T L 6 2 0 7 69 49 6 e 174 G L 9 0 0 9 41 49 4 e 175 K L 8 1 0 8 100 49 4 e 176 A L 8 1 0 8 0 0 1 b 177 T L 8 1 0 8 0 0 0 b 178 P H 3 6 0 3 0 0 0 b 179 F H 7 8 0 1 0 0 3 b 180 Q H 9 9 0 0 71 36 1 e 181 E H 9 9 0 0 69 36 1 e 182 L H 9 9 0 0 59 36 1 e 183 V H 9 9 0 0 0 0 9 b 184 L H 9 9 0 0 0 0 8 b 185 K H 9 9 0 0 73 36 2 e 186 A H 9 9 0 0 0 0 9 b 187 A H 9 9 0 0 0 0 9 b 188 A H 9 9 0 0 0 0 9 b 189 R H 9 9 0 0 89 36 2 e 190 A H 9 9 0 0 0 0 8 b 191 S H 9 9 0 0 46 36 1 e 192 L H 9 9 0 0 80 49 3 e 193 A H 2 5 0 4 51 49 4 e 194 T L 7 1 0 8 0 0 1 b 195 G L 9 0 0 9 0 0 2 b 196 V L 8 0 1 8 35 25 1 i 197 P L 4 0 3 6 0 0 2 b 198 V E 1 0 5 4 0 0 6 b 199 T E 4 0 6 2 0 0 0 b 200 T E 2 0 5 4 0 0 2 b 201 H L 4 0 2 6 0 0 3 b 202 T L 6 0 1 7 0 0 2 b 203 A L 2 3 0 5 0 0 1 b 204 A H 4 6 0 3 0 0 1 b 205 S H 8 8 0 1 0 0 4 b 206 Q H 9 9 0 0 97 49 4 e 207 R H 8 8 0 1 121 49 5 e 208 D H 8 8 0 1 79 49 4 e 209 G H 9 9 0 0 0 0 2 b 210 E H 9 9 0 0 95 49 3 e 211 Q H 9 9 0 0 71 36 2 e 212 Q H 9 9 0 0 0 0 4 b 213 A H 9 9 0 0 0 0 6 b 214 A H 9 9 0 0 0 0 7 b 215 I H 9 9 0 0 0 0 4 b 216 F H 9 9 0 0 0 0 8 b 217 E H 9 9 0 0 69 36 2 e 218 S H 9 9 0 0 0 0 1 b 219 E H 2 5 0 4 95 49 4 e 220 G L 6 1 0 7 0 0 2 b 221 L L 9 0 0 9 14 9 1 i 222 S L 9 0 0 9 63 49 5 e 223 P L 7 0 1 8 48 36 1 e 224 S L 6 0 1 7 0 0 7 b 225 R E 2 0 5 4 39 16 1 i 226 V E 7 0 8 1 0 0 8 b 227 C E 9 0 9 0 0 0 4 b 228 I E 8 0 8 0 0 0 9 b 229 G E 7 0 8 1 0 0 4 b 230 H L 3 0 3 6 0 0 1 b 231 S L 7 0 1 8 0 0 1 b 232 D L 9 0 0 8 40 25 1 i 233 D L 8 1 0 8 0 0 0 b 234 T H 6 7 0 2 69 49 5 e 235 D H 9 9 0 0 79 49 3 e 236 D H 9 9 0 0 58 36 1 e 237 L H 9 9 0 0 0 0 4 b 238 S H 9 9 0 0 63 49 6 e 239 Y H 9 9 0 0 79 36 1 e 240 L H 9 9 0 0 0 0 6 b 241 T H 9 9 0 0 0 0 3 b 242 A H 9 9 0 0 51 49 3 e 243 L H 9 9 0 0 0 0 7 b 244 A H 9 9 0 0 0 0 2 b 245 A H 6 7 0 2 0 0 0 b 246 R L 2 4 0 5 62 25 1 i 247 G L 8 1 0 8 0 0 5 b 248 Y L 8 0 0 8 0 0 3 b 249 L E 3 0 5 3 0 0 1 b 250 I E 7 0 8 1 0 0 6 b 251 G E 9 0 8 0 0 0 3 b 252 L E 9 0 8 0 0 0 6 b 253 D E 2 0 5 4 58 36 1 e 254 H L 1 0 4 5 46 25 0 i 255 I L 7 0 1 7 0 0 5 b 256 P L 7 0 1 8 48 36 1 e 257 H L 7 1 0 8 90 49 5 e 258 S L 8 1 0 8 63 49 3 e 259 A L 7 1 0 8 0 0 1 b 260 I L 5 0 2 7 82 49 2 e 261 G L 8 0 1 8 30 36 2 e 262 L L 7 0 0 8 0 0 0 b 263 E L 7 1 0 8 124 64 4 e 264 D L 2 4 0 5 132 81 4 e 265 N L 1 4 0 5 76 49 6 e 266 A H 1 5 0 4 0 0 0 b 267 S H 1 5 0 4 63 49 4 e 268 A H 5 7 0 2 0 0 2 b 269 S H 8 8 0 1 63 49 4 e 270 A H 9 9 0 0 0 0 4 b 271 L H 9 9 0 0 0 0 3 b 272 L H 8 8 0 1 0 0 2 b 273 G L 2 4 0 5 53 64 3 e 274 I H 1 5 0 4 0 0 2 b 275 R H 2 5 0 4 89 36 2 e 276 S H 8 8 0 1 0 0 0 b 277 W H 9 9 0 0 56 25 1 i 278 Q H 9 9 0 0 97 49 4 e 279 T H 9 9 0 0 0 0 2 b 280 R H 9 9 0 0 0 0 2 b 281 A H 9 9 0 0 0 0 3 b 282 L H 9 9 0 0 0 0 7 b 283 L H 9 9 0 0 0 0 9 b 284 I H 9 9 0 0 0 0 7 b 285 K H 9 9 0 0 73 36 2 e 286 A H 9 9 0 0 0 0 8 b 287 L H 9 9 0 0 0 0 8 b 288 I H 5 7 0 2 42 25 1 i 289 D L 3 3 0 6 0 0 0 b 290 Q L 4 2 0 6 71 36 1 e 291 G L 2 3 0 5 0 0 3 b 292 Y L 2 3 1 4 35 16 1 i 293 M L 1 3 2 4 0 0 5 b 294 K L 1 3 2 3 73 36 2 e 295 Q E 3 3 5 1 71 36 2 e 296 I E 7 1 7 0 0 0 7 b 297 L E 8 0 8 0 0 0 8 b 298 V E 9 0 8 0 0 0 6 b 299 S E 7 0 8 1 0 0 7 b 300 N E 6 1 7 1 0 0 0 b 301 D E 2 2 4 2 58 36 1 e 302 W H 1 3 3 3 0 0 1 b 303 L H 1 4 1 3 0 0 3 b 304 F H 2 5 0 3 49 25 1 i 305 G L 1 4 0 5 41 49 4 e 306 F L 3 3 0 5 0 0 1 b 307 S L 5 2 0 6 63 49 4 e 308 S L 1 4 0 4 63 49 3 e 309 Y L 2 4 0 5 55 25 1 i 310 V H 7 8 0 1 0 0 6 b 311 T H 9 9 0 0 0 0 0 b 312 N H 9 9 0 0 0 0 1 b 313 I H 9 9 0 0 0 0 6 b 314 M H 9 9 0 0 0 0 2 b 315 D H 9 9 0 0 79 49 4 e 316 V H 9 9 0 0 0 0 2 b 317 M H 9 9 0 0 0 0 7 b 318 D H 9 9 0 0 79 49 5 e 319 R H 6 7 0 2 158 64 4 e 320 V L 1 4 0 5 69 49 2 e 321 N L 5 2 0 7 56 36 1 e 322 P L 8 1 0 8 110 81 5 e 323 D L 9 0 0 9 79 49 4 e 324 G L 8 0 0 8 0 0 4 b 325 M L 5 0 2 6 0 0 0 b 326 A E 3 1 5 2 0 0 0 b 327 F E 4 1 6 2 0 0 3 b 328 I E 6 0 7 1 0 0 5 b 329 P E 5 1 6 1 0 0 2 b 330 L E 3 2 4 2 0 0 2 b 331 R H 5 6 1 1 89 36 2 e 332 V H 3 5 1 2 0 0 5 b 333 I H 1 3 1 4 0 0 6 b 334 P H 7 8 0 1 0 0 2 b 335 F H 9 9 0 0 70 36 2 e 336 L H 9 9 0 0 0 0 7 b 337 R H 9 9 0 0 39 16 1 i 338 E H 7 8 0 1 95 49 4 e 339 K L 7 1 0 8 100 49 6 e 340 G L 8 1 0 8 0 0 0 b 341 V L 9 0 0 9 35 25 1 i 342 P L 8 0 0 8 66 49 4 e 343 Q H 4 6 0 3 71 36 1 e 344 E H 6 7 0 2 95 49 2 e 345 T H 6 7 0 2 51 36 1 e 346 L H 8 8 0 0 0 0 3 b 347 A H 9 8 0 0 51 49 4 e 348 G H 8 8 0 0 30 36 1 e 349 I H 7 8 1 0 0 0 4 b 350 T H 9 9 0 0 0 0 0 b 351 V H 4 6 0 2 69 49 3 e 352 T H 5 7 0 2 69 49 5 e 353 N L 8 0 0 8 0 0 1 b 354 P L 5 2 0 7 0 0 1 b 355 A H 6 7 0 2 51 49 3 e 356 R H 8 8 0 1 89 36 2 e 357 F H 3 6 0 3 0 0 6 b 358 L H 1 5 0 4 0 0 4 b 359 S L 2 4 0 5 46 36 2 e 360 P L 2 4 0 5 48 36 1 e 361 T L 1 4 0 4 51 36 1 e 362 L L 2 3 0 5 0 0 1 b 363 R L 6 1 1 7 89 36 2 e 364 A L 8 0 0 8 85 81 3 e 365 S L 9 0 0 9 105 81 6 e --- --- PHD PREDICTION COLUMN FORMAT END --- ________________________________________________________________________________ ----------------------------------------------------------------------------- --- PredictProtein: NEWS from January, 1997 --- --- --- --- Dear user, --- --- --- --- as of January 1, 1997, EMBL has effectively decided to not --- --- support the PredictProtein service by personal resources. I do --- --- maintain the program, so to speak, in my private time. However, --- --- my contract obliges me to do science, instead. Unfortunately, --- --- the computer environment at EMBL is at the same time starting --- --- to become increasingly unstable. Consequence of these two re- --- --- cent developments is that the PredictProtein service is not as --- --- stable as it was. --- --- --- --- I apologise for the problems this may cause. In particular, --- --- I apologise for my inability to reply to the 20-30 daily, per- --- --- sonal mails, and suggest to re-submit requests after 24 hours! --- --- --- --- Hoping that I shall find a more convenient solution for the --- --- future of the PredictProtein I remain with my best regards, --- --- --- --- Burkhard Rost --- --- --- ----------------------------------------------------------------------------- --- PredictProtein: NEWS from November, 1996 --- --- --- --- You can now query the minimal waiting time before you may obtain --- --- a result from PredictProtein: --- --- http://www.embl-heidelberg.de/predictprotein/PPstatus.log --- --- --- --- Note: in general weekends are relatively empty, Fridays relatively --- --- busy. --- -----------------------------------------------------------------------------