My previous blog articles describe approaches to calculating kth moments in a single pass – see compute kth central moments in one pass and variance of a sequence in one pass.

I decided to make a rough performance comparison between my approach and boost accumulators api. A reasonable test is to procedurally generate 10^7 random numbers in range [0.0..1.0] as the input data. We compare 3 algorithms -

- No moments accumulated – baseline, just generates the input
- Accumulate all 12th order moments using Boost Accumulators
- Accumulate all 12th order moments using my vfuncs cpp code

We run on Linux using time command to get rough timings, raw results are -

- Baseline –
**1.16s** - Boost Acc –
**23.16s** - Vfunc Acc –
**2.44s**

If we subtract the baseline we get a rough comparison of **Boost ~ 22s** and **Vfuncs cpp ~1.3s** when the cost of generating the input is factored out.

So **the vfuncs impl is roughly an order of magnitude faster then the current v1.37.0 boost accumulate stats implementation** (both are single pass).

I don’t think the boost algorithm is logically different, its probably more a case of template code complexity having an impact – the 10s compile times might indicate this also. Executable size also reflects this, differing by an order of magnitude.

(Note : Using gcc 4.2.4. I haven’t tried this on other compilers, build settings etc – they could have a profound effect. Let me know if you see different results on eg. Intel or VC compilers)

### Download

Download code and project – kth_moments.tgz – from vfuncs google code downloads page. [BSD license feel free to use]

## 1 comment

Comments feed for this article

February 8, 2009 at 23:26

quantblogUpdate –

The above figures were compiled with default g++ settings, namely unoptimized.

Running the tests with optimization turned on [ g++ … -O3 ] we get much closer results, vis –

Baseline create input – 0.3s

vfuncs StatsRecorder – 0.7s

boost accumulate stats- 1.92s

Which gives a ratio of (1.92-0.3)/(0.7-0.3)=4.05

So with optimization vfuncs is ~4x faster than boost accumulators

without optimization vfuncs is ~15x faster.

Optimization does really bring the boost executable back down to comparable size, presumably factoring out template expansion code thats never used [90% of it!]

Well… over 20 million data points analyzed per second per core, and at 4x better than boost – Id say that gives a competitive advantage to any trading engine.