I was chatting with some other quant developers the other day, as you do, and the issue came up of calculating variances. Its a pretty common operation, and the naive approach does two passes, one to calculate the mean, , and one to gather the squares of differences from that. … but someone asked if it could be done in one pass, and of course it can, quite easily.
Recall, the population variance of a sequence is defined as –
We can expand this and see how the variance after n terms differs from the variance after m=n+1 terms, vis –
and we have in terms of vis –
In other words , so theres very little we need to store to accumulate the variance as we traverse the sequence.
Expressing this in C++ code we’d have a functor that maintains state and gets handed each element of the sequence, in simplest form –
and call it like this –
We could make the above accumulator class larger, and calculate all sorts of other stats whenever we do any pass – eg. mean, variance, nth order moments, skew, kurtosis, spectrum buckets, min, max etc.
A nicer solution is to use boost/accumulators/stats. Basically you decide which accumulators you want, and add them to an accumulators set. Each accumulator in the set is run as the single pass proceeds, even keeping track of dependencies between each other so they are run in the right order.