Project Portal: Briefly tell us about your project.
Thomas Rattei: SIMAP is a database of protein similarities. It contains about all currently published protein sequences and is continuously updated.
Protein similarities are computed using the FASTA algorithm which provides optimal speed and sensitivity. SIMAP is to our knowledge the only project that combines comprehensive coverage with respect to all known proteins and incremental update capabilities.
PP: Why did distributed (or grid) computing get brought up in the course of your research? How did you become aware of the technology?
TR: The computational costs to calculate the similarity data depend on the square of the number of contained sequences. So the computational effort for keeping the matrix up-to-date is constantly increasing. Our internal resources that perform calculations for SIMAP since years are no longer sufficient to keep track of all new sequences. That's why we implemented a SIMAP-client for the BOINC platform (Berkeley Open Infrastructure for Network Computing) which is based on the FASTA algorithm to detect sequence similarities. We became aware of the BOINC technology just because of the very famous SETI@home project.
PP: There are a lot of worthy projects that distributed computing has brought to us. Why should we choose yours?
TR: SIMAP supports many different projects in life sciences. Choosing SIMAP accelerates not only our project but many research projects all over the world.
More . . .