BSN*: What performance gain was achieved after you completed the optimization work on the code?
Gipsel: It depends on what are you comparing. The original source code of Milkyway@Home was grossly inefficient; it simply wasted a lot of time. The first things to do were no optimizations in a common sense; one had to clean up the algorithm Milkyway@Home is using. In the meantime most of my suggestions for improvements were implemented in the sources maintained by Milkyway@Home. That brought the calculation times for a workunit (WU) down in a massive way.
Using my CPU-optimized code, 65nm Core2 or a Phenom running at 3GHz will take just slightly above four minutes to crunch one of today’s short WUs. The stock applications distributed by the project are a bit slower; they take between about 10-18 minutes. In November 2008, it would have taken a full day for the same WUs on the same CPUs (MW uses longer WUs now). By doing my optimizations into account, Milkyway@Home experienced a speedup of factor 100 on the CPU alone.
But I think readers are mostly interested about the GPU application. ATI Radeon HD4870 completes the same WUs in only nine seconds. Since Quad-core CPU calculates four WUs at once, a 3GHz quad will effectively complete four WUs in about four minutes with the fastest WU. At the same time, ATI’s Radeon HD4870 will complete 25 WUs - six times the throughput for about the same price. Even a last-gen Radeon HD3800 will complete 8-10 WUs in four minutes, still more than double what a fast quad-core CPU can do. If you summarize all the improvements, you see that a single HD4870 is now doing more science than the whole project did couple of months ago! If you compare the beginning of the project with today’s situation, you could claim a gain from “one WU a day†on a single Core 2 processor @3GHz to almost 10,000 WUs a day with a HD4870 [this is a live testament what code optimization can achieve - imagine if every application would have such a dedicated code-optimizer - Ed.].
More . . .