You are currently browsing the monthly archive for September 2008.
Just a note that Ive uploaded the initial version of vfuncs to google code. Ive released under a BSD license so you can use it in your commercial and noncommercial code easily.
Download from here [I’ll import to SVN sometime soon]. See my previous post for a description of vfuncs.
This version contains an example of a digital filter. This can be used to smooth the series data, or apply other signal processing operations. If your familiar with applying a blur filter in photoshop or gimp, using a gaussian filter kernel, this is exactly the same idea (except in one dimension). Gaussian filter is basically just a moving average of the data.
Think of the algorithm as applying a sliding window across the data – the sliding window contains the filter weights, and at each position you apply the weighted average [dot product] of the filter weights against each data point in the window.
If the filter contains a single element of weight 1.0, then the result is just the input (the filter is just the Dirac delta function in that case). If the filter contains [0.25 0.50 0.25] its going to mix each element with its neigbours and take a weighted average, thus smoothing the data.
I want to describe a simple experiment Ive just done, a direct way to write code with medium level verbs in a semi-functional style in pure C.
All of this can be done in C++ and theres certainly more syntactic sugar there, but I wanted to explore the idea in C… C is close to the metal [but not too close, like assembler], compilers generate fairly good machine code, while the language supports a minimalist way to define functions [without lambdas, but we can use function pointers and context pointers to get that, if not in a type safe way].
Another approach would be to do it in C++ with operators and templates, much of it is reusable from STL and boost… yet another way would be to do it in ansi C and use MACROS heavily… but my experiment is to make simple, readable C code thats fairly quick.
In the K (terse) and Q (less terse) languages of KDB+, one can express something like this –
drawfrom:{[spec; vals; n]
mon: 0, sums nmlz spec; idx: mon bin n?1.0; vals[idx] }
Basically this reads –
function drawfrom(spec, vals, n)
mon = partial sums of spec (the cdf after normalizing to 1.0)
generate n random numbers uniformly in [0,1]
idx = array of indexes of each random sample into mon
return the values indexed by idx
So basically, this semi-functional zen kaon simply generates n random samples from the spectrum supplied. Think of spec as the weights that determine how often each of vals appears – spec is a histogram or discrete pdf. Actually this is the readable version, closer to Q than K, as Ive defined nmlz and used the verbose style – in K it can be much more ‘terse’ [proponents would say succinct].
At first this style of programming is annoying if your from a C++ background, but once you get used to it, you begin to think directly in terms of verbs working on vectors – In the same way that std::vector allows you to think at a higher level and avoid many for() loops by using accumulate and other languages the foreach construct…
So how does this look in C? Try this –
Im normally a kdb+ kind of guy when it comes to managing huge amounts of streaming and historical tick data, the performance is great, the app small and clean and the language Q terse with just enough to get the job done. On the downside Q is a bit cryptic, and the documentation is brutally terse.. though readable.
I decided to do a bit of googling to see whether another product was out there that might be useful, and came across StreamBase, which is the commercial outgrowth of some research projects at MIT – Aurora, Borealis and Medusa. These projects were led by Michael Stonebraker, who invented {|discovered?} ingres and postgresql databases in their original form. His short blurb on Stream processing – Data Torrents and Rivers – is a worthwhile introduction.
The StreamSQL language spec seems to be independent of StreamBase, as it has its own site which describes the language – StreamSQL.org.
StreamSQL does seem to fall short of being a fully independent spec, and I wanted to make some comment on this… because the world really does need an accessible stream processing language that acts in the same way as SQL – I love Q but I just dont see your average quant developer having time to grok it when they already have to learn C++/Perl/Python/Matlab/R and I guess soon ruby [until lisp becomes the 100year language].
Heres my Open Letter to the StreamSQL people –