I decided to make a rough performance comparison between my approach and boost accumulators api. A reasonable test is to procedurally generate 10^7 random numbers in range [0.0..1.0] as the input data. We compare 3 algorithms -
- No moments accumulated – baseline, just generates the input
- Accumulate all 12th order moments using Boost Accumulators
- Accumulate all 12th order moments using my vfuncs cpp code
We run on Linux using time command to get rough timings, raw results are -
- Baseline - 1.16s
- Boost Acc – 23.16s
- Vfunc Acc - 2.44s
If we subtract the baseline we get a rough comparison of Boost ~ 22s and Vfuncs cpp ~1.3s when the cost of generating the input is factored out.
So the vfuncs impl is roughly an order of magnitude faster then the current v1.37.0 boost accumulate stats implementation (both are single pass).
I don’t think the boost algorithm is logically different, its probably more a case of template code complexity having an impact – the 10s compile times might indicate this also. Executable size also reflects this, differing by an order of magnitude.
(Note : Using gcc 4.2.4. I haven’t tried this on other compilers, build settings etc – they could have a profound effect. Let me know if you see different results on eg. Intel or VC compilers)