Im normally a kdb+ kind of guy when it comes to managing huge amounts of streaming and historical tick data, the performance is great, the app small and clean and the language Q terse with just enough to get the job done. On the downside Q is a bit cryptic, and the documentation is brutally terse.. though readable.
I decided to do a bit of googling to see whether another product was out there that might be useful, and came across StreamBase, which is the commercial outgrowth of some research projects at MIT – Aurora, Borealis and Medusa. These projects were led by Michael Stonebraker, who invented {|discovered?} ingres and postgresql databases in their original form. His short blurb on Stream processing – Data Torrents and Rivers – is a worthwhile introduction.
The StreamSQL language spec seems to be independent of StreamBase, as it has its own site which describes the language – StreamSQL.org.
StreamSQL does seem to fall short of being a fully independent spec, and I wanted to make some comment on this… because the world really does need an accessible stream processing language that acts in the same way as SQL – I love Q but I just dont see your average quant developer having time to grok it when they already have to learn C++/Perl/Python/Matlab/R and I guess soon ruby [until lisp becomes the 100year language].
Heres my Open Letter to the StreamSQL people -
Ok, StreamBase looks like a good product, StreamSQL looks like a good language…
The value of StreamSQL is that it can be vendor neutral and have several competing implementations – exactly analagous to SQL, right?
Your StreamSQL site is nice… and very annoying :]
1) Please put a formal spec of the complete language as a single text file on your site – and give it a version number. Its nice to have the pages describe the formal spec, but its also annoying for people who have read a _lot_ of BNF / language specs – I want to see the complete language on one page, and know that that is exactly the syntax of version 0.96.20080321 of the spec!
0) This needs to be done as a consortium – currently StreamSQL is ‘owned’ by streambase and this site just makes that obvious, by trying to not make it obvious – until you get past this, no-one will touch StreamSQL with a bargepole, and we will have N competing standards that vary by unimportant details… no matter what happens they will, over time, converge to a default that everyone will then use.. Now, would you rather that default be Q? Making StreamSQL the defacto standard is in your interests… you will sell more streambase if you fully open the spec… it should be a consortium, have a voting body, and have at least 2 vendors who support it – or a vendor and perhaps an open source product.
A side issue is tone down the marketing… yes you’re already doing that, but not enough to make this work. Consider having people involved who are in academia or open source but who are not being paid by StreamBase. [am I wrong in assuming your bloggers are employees?]
I think part of maturing as a standard is that you start to see more examples publicly available on this site – I know this is the microsoft way to make people register to get to see your samples, but in the year 2008, with all the open source movement victories… well, it is evil to do that the old way :]
ps. Congratulations on a nice spec and a great product – things will advance for all if the spec grows away from its parent, and gains real independence!
Maybe a ‘Stream Query Language’ is important enough that it should just be added to ansi SQL? – the reason I think this won’t happen is that its just too expensive for vendors to say they are SQL compliant.
I guess in time, we’ll see an opensource stream database which is language compatible with either StreamBase / StreamSQL or kdb+ / q. But it’s the same old war of BetaMax vs VHS or Bluray vs HD-DVD, only one can win.

5 comments
Comments feed for this article
September 2, 2008 at 4:08 am
brainyoga
um, IMHO kdb+ has already won. Hundreds of ‘average’ quants already program in Q every day. I would not agree that Q is cryptic. Terse, yes. Documentation has been improving over the years, and commercial training courses have been delivered to countless users. The job market for kdb+ developers has also been taking off, with demand now so strong that a headhunting company now specialises in placing good developers worldwide – and they have more demand than supply. If streamsql becomes commercially interesting and offers advantages over Q, Kx could easily write another language layer to compile to their byte-code (kdb+ is actually a virtual machine that executes the k and q languages).
September 2, 2008 at 5:07 am
quantblog
yeah, I do agree – kdb+ is very nice, absolutely gets the job done and q works well.
Compared to some languages I’d almost concede Q is terse rather than cryptic [ given the average quant dev is a bit more language-experienced than the median SQL biz apps guy ]
Be nice to have a standard – stream data is going to grow exponentially Id say, given the expansion of trade data and physical sensor data… I can see biz devs needing it soon enough.
hmm… I tried to make a google trend graph comparing kdb+ and any other stream databases to see if it shows the growth you and I both see -
http://www.google.com/trends?q=kdb%2C+streambase%2C+xenomorph%2C+hdf5
but thats not very helpful, possibly as kdb is a very overloaded term..
A Wilmott post also mentions people using hdf5, and another effort called root, but I guess thats low volume…
Ive seen more notes where people use memcache in front of a db, but thats kind of reinventing the wheel as youll need to manage the cache via your code..
Whos the headhunter you mention? :]
September 2, 2008 at 6:53 am
brainyoga
kdb came out in ’98, kdb+ in 2003 – both process real-time data streams from market data feeds, optionally using custom rule sets for deriving complex events and reacting accordingly -> e.g. buy/sell actions. The market for this is so niche it is hard to imagine how a standard, which would likely be generic in some ways to allow targeting of other industries, would remain efficient enough to still be competitive enough to displace Q from the podium – milliseconds count. Also bear in mind the momentum behind Q (so many banks use it, it is shorter to say who does not use it, and that list gets shorter every day) – who would bother to replace that technology with something that likely performs the job no better? Where’s the value added.
Over the years there have been many products that slip into the market and then fall away. Most are funded by ill-advised vulture capitalists, or have an academic sponsorship, but Kx has grown organically, sustained by real sales to real clients who use it, like yourself, for real business. What other product can handle streaming data, real-time queries and huge historical data in the same platform using the same language? I can’t think of any, and as you are now doing, i too did my research.
My advice would be to continue perfecting your kdb+ skills, and give Kx feedback on where their product falls short of your expectations. They listen very carefully to their customers.
Headhunter – ikas international
e.g.
http://jobs.trovit.co.uk/index.php/cod.search_jobs/what_d.kdb+/
September 2, 2008 at 7:41 am
quantblog
Brainyogi, Kx should hire you as technical evangelist! My stance really is pro-KDB and pro-StreamBase.
I do agree that a language that works well beats a spec that is a compromise. A standard emerging from programmers doing real work, is far better than a group of management geeks voting for compromises in endless rounds of meetings… It would be just fine with me if kdb+ won and became the defacto standard.
I don’t think stream data processing will _remain_ a niche for much longer. I think it will grow rapidly to become far more mainstream, partly because a lot of data will come from sensor networks, partly because machines will create much more data, partly because storage space grows so rapidly that all that sensor data can be kept.
Regarding the kdb+ / q language, I would like to see Kx systems be even more open about the docs – ie. don’t make someone register to get on your developer program, just make the docs show directly on your website. This is exactly the advice I have for StreamBase. Put all your docs online, it only encourages sales.
I really like lisp, which probably puts me in a minority, so Q is accessible for me, but I guess StreamSQL is a symptom that some people desire an SQL-like language.
SQL is a bit ugly, it reminds me of cobol, whereas q reminds me a little of lisp – if stream data became so commonplace that it drove more developers to kdb+/q … and to the concomitant change of mindset … that’d be a good thing.
btw, thanks for the link to ikas agency. Several mentions of kdb on http://www.nuclearphynance.com also.
September 2, 2008 at 1:52 pm
brainyoga
I think one has to consider how much of the perceived growth of stream processing is due to ~100 million USD in VC being blown on marketing by new comers to this scene. I agree that the market growing but am not convinced it really is growing fast enough to generate an open standard, yet. Niches do not support competition well, at least commercially.
Kx opened their dev program earlier this year. There is a non-commercial version of kdb+ that is available for download from their website. It has some restrictions, like timeouts after 2 hours and a few other things i think. You don’t have to register afaik but you do have to accept their license.
Kx also opened up http://code.kx.com which allows anonymous users to read the contents – tutorials, code tips, contributions etc etc. And they have a google groups for non-commercial developers.
I’m suspect that having something SQL based is a false sense of security. There is no royal path to solving problems in this domain, and much like any programming domain, suitable languages will grow in usage through demand.
I would agree that programming in Q requires a programmer brought up in the world of c/java to take a step back and open up their mind for a new paradigm, but possibly no different to matlab or other array based languages.