Friday 26 February 2010

Putting Haskell Between C++ and Java (Language Shootout Exercise)

Note: These speedups apparently require GHC 6.12, so the entire speedup won't be seen on the Shootout until/if they switch to GHC 6.12 at some point (possibly after the next Ubuntu stable release?). However, there was still about a 3 second speedup to hold us over until then ;).

Lately I've been sort of poking at the language shootout, to see what's there. The previous two posts have centered on this theme. However, unlike those posts, this one doesn't want to show a Haskell program could be made faster, bur rather how it can be made much faster.

Looking to reduce one of the greatest disparities, I looked to the fasta benchmark. In this case, the fastest submission is the C implementation, with a great time of 1.73 seconds. Avoiding success in the most complete way possible, the Haskell implementation scores in at about 20th place, being 9x slower than the fastest C. However, the entry doesn't have terrible company, at least it is bordered by a couple (JIT) compiled languages, F# and Go.

Ok, so what can be done?

I don't know. First we have to find out where the problem is! For this we can look to the ghc profiling output.
For this I just compile with the ghc options "-prof -auto-all", in addition to the other options. When we use this, and then additionally run the binary with "+RTS -p -xc -RTS" we will get a file called (for example) fastaold.prof. This will report, first:


look                           Main                  38.7   71.7
choose                         Main                  26.6    0.0
rand                           Main                  14.6   20.7

and additionally:

 look     Main       282   403333334  38.7   71.7    79.8   92.4
   rand   Main       285   200000000  14.6   20.7    14.6   20.7
   choose Main       283   623445679  26.6    0.0    26.6    0.0

The top output basically tells us the main offenders (by time of execution) in our program. The bottom is an extract from the more detailed information. clearly these functions are called a lot. Therefore, it makes sense to try to optimize the lookup function for a symbol.


Okay, so clearly the most commonly used functions need optimization... (this is really revolutionary advice here...)

Briefly, these are the optimizations I have made. Not all of these may be really useful, but it is a true listing of what I did:


  • Replace the pair of symbol/probability with its own datatype
  • Ditto for probability/random-seed pairs
  • Changed there representation from PDF to CDF, saves some subtraction.
  • Pushed look into the body of the fold function (this actually saves quite a bit of time, I guess it allows better specialization).
  • Introduced a "cached" version of the list, which essentially pushed the first 4 entries up. I believe this makes it have more-or-less constant access time for these elements, without having to "uncons" everything, as with a regular list.

These seemed to pay off! I believe they even don't violate the spirit of the language shootout, as they certainly do less of a modification than the top C version. The top C version essentially seems to expand the CDF into a larger space, so that it basically becomes a constant-time lookup for the correct symbol: my version is still quite linear.

In the end I was able to make the following improvement, going from:
0m13.305s

to:

0m6.219s

This is a 2.14x improvement! The implementation is still much slower than the 1st place one. Its a little hard to say how this would affect the Shootout results themselves, given I have different hardware, and a different version of GHC (they have 6.10, I have 6.12). However, if one wants to perform the dubious step of applying my speedup to the current version listed on the Shootout page, it should move the Haskell entry from 20th place to 8th, and put it right between a C++ and a Java entry. That's pretty nice company to be around!

(Double check my work! Files hosted on GitHub)

3 comments:

  1. Why don't you submit your improvements to the shootout?

    ReplyDelete
  2. Why don't you contribute your improved programs to the benchmarks game?

    ReplyDelete
  3. dons, Isaac: There's no reason not to. In fact, I will, when I find some spare moment to figure out the procedure :)

    ReplyDelete