You are currently browsing the category archive for the 'Uncategorized' category.

Surprisingly for a lisp programmer I’m enjoying developing iPhone apps.  My clients Melbourne GigGuide app is doing well and I’m much happier after moving from XML to JSON format for my data feeds.

Verbose

Objective-C is simpler than C++ and closer to a scripting language in some ways.  The UIKit api itself is well designed, despite having verboseMethodNames.  This isn’t so much an RSI problem – the XCode editor does a nice job of guessing and auto-complete, while being non-intrusive – but its a lot of text to read which tends to obscure the idea behind the code.

Deeply nested JSON trees

Today I found myself chaining down a few levels of nested json objects, and decided I needed a much shorter syntax.

Many scripting languages have an associative array lookup syntax like someMap["key"], and these can be chained.  In Obj-C, to access an NSDictionary you use objectForKey:@”key”.

at() Hack

I wanted something shorter so wrote this small hack  -

id at(id ob, NSString* k)
{
 if (ob && [ob isKindOfClass:[NSDictionary class]])
     return [ob objectForKey:k];
 return NULL;
}

Now I can index down several levels into a json tree easily, like this :

id obShops = at(at(at(json, @”Boutiques”), @”London”), @”EastEnd”);

Not as succinct as square brackets, but saves a lot of typing… enjoy

I wanted to vocalize some thoughts about secure mashups, after watching Douglas Crockfords 2007 google talk on the subject – Gears and the Mashup Problem.

Doug wrote the book ‘Javascript the Good Parts’ which has really pushed everyone forward to understand the powerful features of the language such as dynamic typing, lexical scope ‘closure’ and functions as first class objects.  JavaScript at its roots is more like lisp than Java and thats a good thing, although there are some bad parts to avoid.  He also discovered JSON, which has become the default data format for the web, saving us daily from the syntactic hell that is XML.  He has certainly earned the nickname of ‘Yoda’ in the JavaScript community.

JSON

I say ‘Yoda discovered JSON’ because I recall implementing a similar format, a decade or so ago, to solve a problem for a telecomms company I was consulting for.  In this case we wanted to send config settings between the fat GUI client and the back-end server system.  The config was naturally tree structured.. the kind of thing at the time people put in windows ini files [an ugly data format if ever there was one].   At the time I was ignorantly unaware of functional programming languages such as lisp and hence knew nothing of S-expressions [sexp on wheels!].  I was also wondering if the new XML would become a passing fad, and was deeply suspicious of Microsoft’s proprietary non-standard looking, tightly COM-bound, vendor locked-in XML api.   Anyway,  I forged ahead and wrote a nice little grammar, coded up a parser using lex and yacc, and never looked back.

Like every experienced programmer, Ive used SAX2 and dom parsers for XML in several languages. But it always grated on my sensibilities, the simplicity of the thing was hidden beneath ham-fisted verboseness. This feeling was largely subconscious, until the time I wrote more than a couple of pages of XSLT, and the ugliness of the approach slapped me in the face – not only was I hiding my data in an arcane dogma of syntax, but I was trying to write _programs_ to transform them in this nonsense!  It was the height of bad taste, so when I happened upon JSON it just felt like the right way to phrase tree data as text.

JSON is lightweight and terse – trivial in the best sense.  Theres no surprise that its had such incredible uptake among web programmers over the last couple of years, as its all the good parts of XML with none of the pain.

Mashups are Insecure

Doug talks about the main problem we have with web development today – Security, we have none!.

The default web development platform of JavaScript plus JSON plus AJAX, has been phenomenally successful to the point where we have interactive web page widgets we can drag and drop to compose our own user-customized pages in real-time : true web ‘mashups’.  Just take a look at the modularity of facebook as an example, or the ubiquity of google map widgets on individual web sites.

The problem is that any JavaScript widget can read and write any other widget on the page – a real issue if you have your stocks or bank widget embedded in the same page as a faulty or malicious world clock applet.

In his talk Doug suggests we need to isolate widgets into ‘vats’ containing JavaScript and DOM, and allow only JSON messages to flow between them, via a page-broker.  Each widget would be able to write to its own local DOM, to achieve its gui and interactive effects, but not be able to read or write contents within another vat.

As well as being a natural security boundary, Vats could also be a great way to scale performance – each Vat could run in a seperate thread using a CPU core, while keeping the current JavaScript programming model of a single execution thread.  Dougs talk was really a call-to-arms to get Yahoo, Google, Facebook and browser organisations to work together and make this happen.

Widget API

In other posts Doug has recommended some of the mechanics to achieve secure mashups via vats – JSONRequest(), the <module> tag, and send() / receive() api for json messaging.  In the Vat model the browsers would have to implement the page-broker and the module tag.  That requires agreeing to a simple and implementable API that can be rolled out and adopted gradually.

A new API also means defining a formal standard, and this just never seems to work unless there is already a defacto-standard implementation that has some market adoption. History tells us that you just cant get an api right without implementing a workable system first, and having people use it.  History also tells us that standards bodies never get anything done, and that you cant get 5 people to agree on an API, let alone 25.  The way forward is to implement something imperfect, then standardise what works.

I googled the web for ’secure web mashup’ to see if there had been any progress since Dougs 2007 talk.  The answer seems to be ‘not so much’.  I did find an API for secure mashups by OpenAjax.org and some dojoX work, there might be others out there I simply dont know of [send me an email if you know of any].

OpenAjax.org are tackling secure mashups with their OpenAjax Hub 2.0 spec.  This was just released so I wonder if anyone is using it yet.  Their Managed Hub api looks to me to be too complex.  We need the absolute minimum that will enable messaging and allow for a good improvement in security – I just cant see this being adopted by busy programmers.

We need a solution.  We need it yesterday.

Page Broker as a service

Once we have a working API, that emerges as the defacto standard, there will still be a long rollout time.  People slowly turn over the version of their browser, often when they upgrade their OS, roughly every 2 to 3 years I guess.  For people with old browsers, it would be ideal if there was some partial migration option.

One idea I had is that the page-broker could be run outside the page as a separate process.  That might not enforce such good security but it might enhance security and allow Widgets using the new API to work in old browsers.

The widget module API would be something like module.send(json) and json=module.recieve() where all inter-widget and widget-to-page communication is via json text messages.  We could then stay API compatible and have widgets on one page talk to widgets on another page on another computer ( widgets talking Peer-to-Peer ).  We could have a trusted Page-Broker run on a server or even locally for faster performance.

Having the page-broker serve up some data publishing what widget services are available, and mediating messages between widgets seems like a model in the spirit of the internet.  Its implementable, nicely isolated, scalable and you get offline mode easily by running a local web server.  You can also publish non-GUI capabilities, such as data storage [CouchDB would play well with this setup]

I realise an out-of-process Page-Broker might not be the final form – we likely have to change JavaScript and the dom to enforce security.  But it does seem a really good interim setup for experimenting with, while the new browser features are being rolled out over time.

Hmm…

xilla tags generates HTML tags from PHP.

I found my code tends to be more readable using this approach. It has a mildly lisp flavor, and results in nested PHP code that looks something like this -

table("",
     tr("", array(
          td(array("class"=>"field"), "Name"),
          td("", "Jim Blinn"))));

As you can see it uses tag functions to build up a tree of PHP native arrays, which is then rendered using render_tags().

See the previous post for an overview and comparison with another tags library, HAML. The main difference is that HAML is a parsed syntax, whereas xilla tags is native PHP.

BSD licence, on google code here –  http://code.google.com/p/xillatags/downloads/list

A while back I did some work in scheme lisp and was much impressed by the readability of the code which generates the HTML – basically a nice nested tree of self closing tags, where you can build up levels of abstraction, while still be close to the exact HTML your rendering and have control over it.

Then I had to go back to PHP for pragmatic reasons, and I just didn’t gel with any of the frameworks or tag libraries around. I experimented with various template engines but it seemed annoying to break syntax and use yet another language. I did find an implementation of lisp in PHP, which I though was nifty, but the people who hired me as a consultant were eagerly awaiting their code bundle in PHP.

PHP just wont die. It may not be an elegant language but its the workhorse of the internet.

I had to find a way to close tags automatically and keep everything readable. There’s a lot of bad PHP code around, where the script basically progresses from the top of the web page downwards, mixing php code snippets with HTML, javascript and SQL statements – not a recipe for clarity.

The nested xilla_tag functions build up a tree built with PHP arrays.  This tree is then walked and HTML tags generated when you call render_tags().  As the tree is written out, the end tags are automatically added, and the resulting output indented.

Example Comparison

To make a valid comparison, I use the same HTML sample given in another tag library, namely HAML – see phphaml.sourceforge.net

Compare ‘Old school’ PHP program code -

non lispy php code

To lispy php code using xilla tags. Read from the bottom sample_page() upwards -

lispy php

lispy code body

Download

I’ve uploaded xilla_tags to google code. You can use it freely under a permissive BSD licence – see the project page or download here.

I hope you find it a more readable way of generating your web page programs in PHP,

enjoy,

gord

Via HN I found and excellent developer manager blog by Rands here – www.RandsInRepose.com

Rands worked on the Paradox team at Borland, and recants his take on the Database vs. Spreadsheet space in his article - Keeping Track of Everything

great blog, Ill have to get the book “Managing Humans“…

Hacking Xilla

Ive been hacking on a proof-of-concept for Xilla, really to try out some of these ideas and see where they go.. at some stage Ill upload that to the projects home -  WebXilla.com

spreadheet limitations?

In the meantime, given my comments about how people use spreadsheets instead of databases and why, I want to keep a link to this Hacker News discussion – “What Cant you do in Excel?”

The discussion is fascinating, the guts of it boils down to -

  • its got to be on the web, in the browser [shareable, publishable, multiuser]
  • spreadsheets should be more like databases [data types, data relations, named attributes]
  • we want to use spreadsheets as a live interface GUI for databases, sometimes
  • need a better language for programming [eg. python, ruby, lisp - anything but VB!]
  • need a better language for querying [ SQL or an alternative]
  • need better naming of data ranges [names not C1:C37]
  • better import/export/conversion/integration

To this I add my own wishlist -

  • browse data links [relations] as web hyper-links
  • runtime editable schema
  • data-app prototyping environment, RAD tool
  • complete versioning, check-pointing and backup
  • fine grained user access controls & permissions

A spreadsheet that has all of these is no longer a spreadsheet, or a database but a new new thing – so lets stop asking how to improve spreadsheets and ask how to build this data-web-’Xilla’.  I’m calling this new thing Xilla, because it needs a name.

Its tempting to deny or address the opinions above by saying ‘if you need better data management, grow up and use a real database’ .. but the real point is that spreadsheets solve part of a problem, databases solve part of an overlapping problem – and where we are now is that neither solves the problem really well, or at least well enough.

The reason people stick with spreadsheets is because they partly work, and they don’t want to move to a database and lose what flexibility they have – living with the limitations is better than fixing a schema in stone and writing a web app that is tied completely to that schema [and losing all the nice computational tools and UI they currently have in the process].  In an organizational context, theyll have to give up their dato to the experts, and once they do that theyll never see it again, or at least have no rights to change and redefine the structure or adapt it to the problem they have as it evolves.

I think quite a few startups are basically porting the spreadsheet or traditional database onto the web.. but thats not really what is needed, or its only 10 to 15 percent of whats needed… This approach rules out the new kind of app that might come into existence if we really revisited the assumptions.

Independently of users data needs, theres a lot of movement towards a pure attribute based or key-hash style of database.  David Heinemier Hansen mentions that Rails uses the database as a kind of big hash table – data is given a key and thats used to get the data back, so most of the other DB features arent really used.  Amazons Cloud DB service seems to do a similar thing, and many many web apps are getting huge scalability and performance gains by going for the memcached api with a similar style.

Combine that with the ubuquity of XML, and the fact that treelike data doesn’t fit easily within a table database structure and it seems the whole “data on web” space is ready for an overhaul, and there are sneak peeks of how it might be done the right way.

There is a better way to manage data – its there in front of us, perfectly formed inside the block of marble, just waiting to be uncovered.

WebXilla – the web 2.0 mind-map for your data

Problem – You cant change your database to fit your business and you cant do it incrementally.

Spreadsheets only handle the very early phase of data growth.  Custom database web systems are risky and expensive to develop.  Off the shelf web solutions can’t be customized nor easily integrated together.  Wikis and web content management systems don’t respect data schemas.

Solution – Xilla dramatically lowers the cost of entry because you can start with where your data is now, and you dont have to develop software.  Changes to the schema are immediate, codeless, free and reversible.  Running and maintenance costs are cut to a bare minumum.

Xilla allows you to experiment safely and make incremental changes to your data design without writing SQL or hiring DBAs and programmers.  Xilla reduces missed opportunity costs, because the schema stays in step with your business, and you can prototype and experiment cheaply.

Xilla is -

  • a place to share important data privately or publicly
  • a RAD prototyping tool & web development environment
  • a customizable database with search engine
  • a new tool for browsing cascades of data links
  • a community of users, developers and data architects
  • a schema repository of reusable solutions
  • a securely hosted, high performance web service with automated backup
  • a concrete semantic web, a mind map for your data

The Xilla user interface is completely web based and the data management, backup and hosting is covered by the monthly subscription fee.  There is no app to roll out nor database to take down and restart when you make a schema change, nor does that require writing scripts or code.

You can reuse existing schema solutions from a library of Xilla domain models to get started, then customize to suit your needs on an ongoing basis.  There is no fixed delivery date when the much awaited over hyped web app arrives to fit an outdated snapshot of your business – both your business and your data projects grow in sync with Xilla.

In most databases you have to know what field or column to search for.  Xilla search is like web search – by default it searches everything and shows you where the search terms occur.

Enter a keyword to find a starting point, then browse through related items and follow the data relationship chain, from item to item to item.  While doing this you’ve constructed a custom page view with all the relevant information at hand – useful summary for a sales call or purchase decision, bookmarked for later use.

Xilla makes relationships in the data stand out clearly because you can see them directly – you click to drill down through the cascade and are reminded about the structure of the data as you use it.

This powerful new browsing feature, combined with the benefits of incremental improvement make Xilla a revolutionary new tool for managing and evolving your business data.

My previous blog articles describe approaches to calculating kth moments in a single pass – see compute kth central moments in one pass and variance of a sequence in one pass.

I decided to make a rough performance comparison between my approach and boost accumulators api.  A reasonable test is to procedurally generate 10^7 random numbers in range [0.0..1.0] as the input data.  We compare 3 algorithms -

  • No moments accumulated – baseline, just generates the input
  • Accumulate all 12th order moments using Boost Accumulators
  • Accumulate all 12th order moments using my vfuncs cpp code

We run on Linux using time command to get rough timings, raw results are -

  • Baseline   -   1.16s
  • Boost Acc – 23.16s
  • Vfunc Acc -  2.44s

If we subtract the baseline we get a rough comparison of Boost ~ 22s and Vfuncs cpp ~1.3s when the cost of generating the input is factored out.

So the vfuncs impl is roughly an order of magnitude faster then the current v1.37.0 boost accumulate stats implementation (both are single pass).

I don’t think the boost algorithm is logically different, its probably more a case of template code complexity having an impact – the 10s compile times might indicate this also.  Executable size also reflects this, differing by an order of magnitude.

(Note : Using gcc 4.2.4.  I haven’t tried this on other compilers, build settings etc – they could have a profound effect.  Let me know if you see different results on eg. Intel or VC compilers)

Download

Download code and project – kth_moments.tgz – from vfuncs google code downloads page.  [BSD license feel free to use]

Some links while I have them at hand…

Compressed sensing basically gives a better practical way of recovering a signal from its samples than the Shannon Sampling theorem suggests – given some structure on the data.

Recent methods using L1 norm, as a compromise between L2 and L0, mean this is much more computationally feasible. New work by Tao, Candes etc gives some proofs on why this sparse sampling works so well.

Useful compressed sensing links -

Continuing on the same topic as my previous post, its nice to be able to gather all the kth order moments in a single pass.

Last time I mentioned the boost/accumulators example, but you will have noticed two issues if you use that.  Firstly, moment<k> tag will give you the kth simple moment relative to zero, whereas we often want the kth central moment of a sequence relative to the mean.  Secondly, although boosts accumulator is well written it does seem to take a while to compile [~ 12 seconds for code using moment<12>].

After some playing around Ive got a faster simpler approach, where the inner loop accumulates kth powers of the element.  After you’ve run the sequence through, you can then easily extract variance, and all the kth central moments.  So in adding the more general case of kth moments, Ive made the particular variance case simpler.  That often seems to happen in programming and math!

algebra

First a bit of math and then the code.  We want to express the kth central moment in terms of the k basic moments.

First, lets define the basic moment as -

 \displaystyle M_{n}^{j}= \sum_{i=1}^n {x}_i^{j}

We rearrange the binomial expansion -

\displaystyle nv_{n}^{k}= \sum_{i=1}^n({x}_{i}-\mu_{n})^k

\displaystyle = \sum_{i=1}^n \sum_{j=0}^k \binom{k}{j} {x}_{i}^j(-\mu_{n})^{k-j}

\displaystyle = \sum_{j=0}^k \binom{k}{j} (-\mu_{n})^{k-j} \sum_{i=1}^n {x}_{i}^j

So we have the kth central moment given as a weighted sum of the kth simple moments -

\displaystyle v_{n}^{k} = 1/n(\sum_{j=0}^k \binom{k}{j} (-\mu_{n})^{k-j} M_{n}^{j})

which shows that all we need to accumulate as we walk across the sequence is the kth simple powers ({x}_{i})^k .

Notice the variance is now handled as a special case where k=2.  Likewise k=0 corresponds to n, the element count and k=1 is the sum of elements.

c++ impl

Heres a basic impl of the above expression -

Read the rest of this entry »