Note: You will be expected to turn in the final version of your algorithm for assessment.
0. Make your own copies of the following files:
ChouFasman.py, which includes preliminary code for this assignment.
read_fasta.py, the procedure we explored previously for reading FASTA data.
NP_005408-fasta.txt, the FASTA file for human SRC.
CF_test.txt, the test data provided by our authors.
1. Your first goal is to understand what parts of the Chou-Fasman algorithm are already implemented, and how you can use them.
for aa in aa_namesloop at the start of the code do? (Make sure to look at the body.)
You may want to call each of the procedures to better understand its purpose.
For example, you can call
ChouFasman on the sequence you get from
2. Develop a set of test sequences that you think will be useful. Your set should include.
3. The code to extend a potential alpha helix (step 1b on p. 218) is not yet written. Write that code.
Note: As our book notes, one difficulty with extending a region is that you may hit the beginning or end of the sequence. Be careful about those situations.
4. The code for checking whether a range is likely to be an alpha helix (step 1c on p. 218) is incomplete. Complete that code.
5. As written,
CF_find_alpha does some unneccessary checking,
and therefore finds duplicate regions. In particular, once it has
identified a potential alpha helix in the range (X,Y), it starts again
near X+1. However, it need not look for the next alpha helix before
position Y+1. Update your code so that the search is more efficient.
6. There is not yet a procedure to find beta strands. Implement that procedure. (You will find that it is very similar to the one for finding alpha helices.)
7. There is not yet a procedure to find beta turns. Implement that procedure. (You will find that this procedure is a bit different, because it does not expand the region, because the contribution of an amino acid to turn probability depends on its position in the region, and because turns are just one unit long.)
ChouFasman procedure currently fails to do
step 4 of the algorithm (finding and handling overlaps, p. 219). Implement
that portion of the algorithm.
9. Implement any other pieces you consider necessary for the full algorithm.
Pick three proteins for which there is a known structure. For each protein, run your version of the Chou-Fasman algorithm and analyze how well (or poorly) your algorithm analyzed the protein.
For example, here are some basic analyses related to alpha helices.
You might also explore how well your algorithm did as compared to PSIPRED.
the literature about Chou-Fasman and describe three possible
improvements to the algorithm. (You need not implement these improvements.
However, you should describe them at a level that your colleagues in the class
could understand the improvements.)
Optionally: Implement one of these improvements and analyze the effect it has on the algorithm (does it really make it better).
This page was generated by
Siteweaver on Thu Oct 27 12:49:06 2011.
The source to the page was last modified on Thu Oct 27 12:48:55 2011.
This page may be found at
You may wish to validate this page's HTMLSamuel A. Rebelsky