Setting the scene

  • the database is 30 Million rows and performance is getting slower
  • users want to search for stuff.. but don’t know what they want to search for
  • the web experience needs to be fast enough to feel “interactive
  • it needs to have an api which the mobile app developers can use.. so json

Questions / Observations

  • Do we even have ‘Big Data’ ?
    • 9GB data as CSV
    • 2.5GB when zipped
  • We could actually fit the data all into 16GB RAM ..
    • why doesn’t the database do that ?
  • What if we fully “invert” the data, so tag searches are fast :
    • data  : id -> tags   [ in named fields ]
    • index : tag -> ids
    • “inverted” when all tags are indexed
    • so, given a tag we can quickly get all the record ids it appears in

Stategy / Plan

  • RAM is fast, use plenty : enough for :
    • cached rows and
    • the full index
  • keep the index fully in RAM : all the time
  • vertical scaling might work

The pain, oh the pain

  • cassandra OK
    • but we don’t have a cluster.. + its 150MB java bloat/dependencies install
  • mysql innodb OK ..
    • but weak sql support + a large csv import died
  • redis OK .. but
    • hit 25GB RAM before dying
    • too much space to store long sets of integers
  • new respect for postgres :
    • great sql support
    • csv import “copy” : nice, simple, fast, reliable
    • up the buffers, and warm the cache with some queries
      • will bring in linux virtual ram, based on most recently used
      • directive to hold index or table in RAM would be nice
  • linux utils, ftw :
    • head tail sed split sort uniq wc
    • csvkit
    • perl -pe ‘s///g’ [ and regexes generally in vim, node.js, perl ]
    • bash shell

The epiphany

  • keep the fully inverted index in RAM, all the time
  • save space with a binary format
  • develop custom “index server”
    • custom binary file format
    • node.js is nice to work with : buffers, streams

Partial Enlightenment

  • SSDs are incredible
    • seek SSD ~0.1ms vs  HDD ~10ms  [ 100x ]
    • read SSD  150MB/s vs ~500MB/s [ 4x ]
    • readable comparison here
  • a middle way :
    • custom index held in RAM
    • data in a sturdy RDB [ postgres ]
    • trust SSD performance
  • json result set compresses well
    • 256 rows ~ 170k json ~ 25k gzipped [ 7x ! ]

Good Performance Numbers

  • running on a linode host with 2GB RAM
    • 1GB for postgres cache
    • 1GB for index tag table :
      • first 700 MB of the 2.3 GB binary index file
      • rest is blocks of ids [ per each tag ]
  • tag search on 30 Million records :
    • 25ms to get the first 1000 ids from the index table and index file
    • 150ms to fetch the records with those ids !
  • web front end :
    • search is fast enough to be “interactive
    • user sees 450ms roundtrip [ including 200ms ping ]
    • gzip the json brings 2.3s fetch down to 0.45s
    • feels nearly instantaneous

Moral of the story

  • SSDs are FAST.. but they are NEW and DB software will need time to catch up
  • NoSQL vs SQL just means more options, and more hybrid approaches
  • You can still scale vertical
    • if your data is big [100M rows | 50GB ]
    • but not BIG big [ 1Bn rows | 1TB ]
  • RAM is expensive on the server
    • because hosting companies can share it over many users
    • 64GB linode = $ 640/m vs 2GB linode = $20/mo
    • potentially save $600 per month
  • we will see more soft realtime web sites .. web with data will feel more interactive


click Melbourne Graffiti item above , drills down to this page :


Handy for nested Product Catalog style apps … or a scrapbook

To build this app in CollabAPI I just defined a Creative item as :

  • blurb text
  • photo
  • more [ a list of Creative items ]

so its like a set of folders … recursion : “its turtles all the way down”



Builders Invoice Mobile App

Lets say we want to create an app for a home Builder / Carpenter – they need to quickly make up an invoice on their mobile tablet, while onsite at a job.

The different things to keep track of are :

  • Customer
  • Job / Invoice
  • Invoice Line Item
  • Product


Step 1 : Define the ‘DataModel’

We define all the properties for each of these :

  • Customer : Name, Address, Phone
  • JobInvoice : Date, Description, link to a Customer, list of InvoiceLines
  • InvoiceLine : a Product, an Amount, a Cost Total
  • Product : Name, Description, CostPerUnit, Units, Photo

I’ve shown the relations in bold, where one item refers to another item.


Step 2 : Run the App

So you just click thru and define the ‘DataModel’ above, and then you can run the app and try it out immediately.

For this Builders Invoice datamodel, the app looks like this :


Screenshot from 2014-09-02 19:51:34


Screenshot from 2014-09-02 19:51:53

Screenshot from 2014-09-02 19:52:27


Quick and Iterative

The immediate feedback is really useful.  You can show a user, get them to try it and give feedback.  You can try out different models and see which fits their business better.

This iterative approach really helps to reduce risk and cost in developing an app.


Any Datamodel

We could have picked any domain and modeled it in the same way with CollabAPI.  Its not about Invoices .. its about making an app to fit your particular business : quickly and iteratively.

If your business finds it helpful to have a photo in each InvoiceLine, to give the customer more insight into the Job .. so be it.

When you business changes, you can change the datamodel to fit the new circumstance.

I hope you’ll agree, CollabAPI takes a pretty radical, and yet very practical approach. is great : you can create form apps for your business without programming.

Replacing paper with mobile touchscreen and having your data in the cloud is such a big win… BUT there are a couple of crucial missing pieces that are vital to business  :

  • more than one kind of thing !
  • relationships between things


The Problem

In real life and in business, there are a few different kinds of things, such as :

  • Customer
  • Address
  • Vehicle
  • Branch
  • Product
  • Invoice
  • Event
  • Group
  • Project
  • Property
  • Contract
  • Supplier
  • Shipment
  • Task

And these things have important relationships between them : A Customer may have several site Addresses.  A Job may have several different kinds of Checklist attached.  An Invoice will have several Products for each Job for a Customer .. and so it goes.



A good app [ mobile or otherwise ] will have a data-model that reflects the REALITY of your business, and it will be flexible enough to handle the relationships between the data items.


This is the old-school database stuff they figured out in 1940, which has been the staple of business IT for the past 60 years – they call them ‘Relational’ databases for a reason.

These days we want more flexible data, there is the whole NoSQL movement, key-value stores, XML and JSON.  To model reality and business we need natural tree and graph structures, not everything is a table.  Just dont throw away the baby with the bathwater : we need relations.

If you don’t have Relations you cant do the fun stuff :

  • one to many [ Comment Wall can have many Posts ]
  • trees and heirarchies [ Deep Product Catalog, Organization chart ]
  • graphs and social networks [ facebook, linkedin, crm, sales ]
  • people interacting [ messaging ]

This lack of relationships is killer for IT departments – they cant run a retail business or a supply chain or a sales organisation or a manufacturing plant without this flexibility.  Its also killer for startups exploring new business models who need an app to manage their operations or customer needs accurately and in a flexible way.


The Solution : Relations

The solution is to have a tool that is great for making forms… but is also great for handling relations between different kinds of things.

We need some deeper engineering that allows you to link things together,  so you can build these kinds of real apps for business :

  • Product Catalog
  • Topic Wall with Comments
  • Invoices where you can select existing products, or add a new one
  • Jobs that can be assigned to different stakeholders
  • Flexible forms with multiple sub-clauses [ Property report with any number of rooms ]
  • Social Network features …


Its early days, but this part we got right from the start – CollabAPI does support relations.

This means we can create apps for real business scenarios :

  • a custom CRM
  • sales business social network
  • a manufacturing process workflow
  • a real-estate property management system

 CollabAPI Beta

If you’d like to be on the beta list to try CollabAPI App Builder tool for your business mobile app… get in touch.

The more Things and Relations your app has, the better !


justgord at gmail dotcom

Havent posted for a good while.. been incredibly busy working on a side project which seems to have a life of its own. is not quite ready for release, but I’ve been using it to get real work done for a while.  Its basically a much quicker way to develop data centric apps.  This kind of app is useful for tracking all the information you need to run a business well – the stuff that goes on paper / word / excel forms.  It would be great if there was a quick way to make it mobile / web and have the data in electronic format from the get-go so you could search it more easily and so on.

Example App :  Gym Membership Sales Pipeline

We can start by making a form each for our Salespeople, and our potential Customers.  I configure up the data model in the App Designer for these, save those changes… then the app comes up on web and tablet straight away :



I did this by configuring the fields, checkbox items etc, and the app reflects this ‘datamodel’  : 


Apparently there’s a sales-cycle thing we want to keep track of –

  • initial contact
  • see if the customers interested
  • what are their goals
  • book a meeting
  • get them to signup
  • success! happiness! cash!

So here’s a go at modeling up a sales call…


… hmm, promising and a good place to start, I can actually put that in front of the Gym Owner and her Chief Sales Maven, and they can try it out on the web or on their tablet device.


The point of this article is not to say ‘you can do apps real quick in Collabapi’ … which is kinda true.

The coolness is really just the way we can iterate and experiment : get something out there quickly and get the users to make suggestions.  

Its more of a GSD or MVP / lean-startup approach to making apps.  Its not just for business apps, but I think it will be really handy for business … putting the love back into the ‘small-data’ that’s important to most people.

I’ll talk a bit about the technology under the hood at some stage – and there is a lot of tech under the hood – but for now, that’s a good place to pause and have a cup of tea :]

cheers,  gord.

justgord at gmail dotcom

Just watched a really interesting documentary on the Flash Crash of 2010 :

Money & Speed: Inside the Black Box

Some stand-out points :
> CFTC / SEC attribute the root cause as Waddell & Reed ‘dumping’ $4.1bn shares
> Eric Hunsader @ Nanex looks at the W&R trades [ see video at 34m18s ]
> W&R trades don’t look like dumping, they maximize sell price during local up runs
> there are other trades that do look like aggressive dumping, ie. rapid sequential bursts down

The Nanex explanation of the Flash Crash : FlashCrashAnalysis

Price manipulation ?

This raises an interesting Question :

Is it possible for a black box algorithm to use a rational probabilistic strategy to drive down stock price in bursts like this, in order to later buy the stock at a very low artificially deflated price ?  You’d need a lot of stock to do this : is there a threshold of stock volume, say 5% of all stock, under which its impossible to create this effect ?

Price Delay Arbitrage ?

Another aspect of this, is the possibility to do ‘Diffusion Arbitrage’ for want of a better name :

If you can drive the market so quickly that derived instruments take seconds to reprice ( due to the storm of new data), then you have that window to trade ahead of the market in options or indexes based on the underlying you have manipulated.

In this case the delay was a whopping 5 to 35 seconds : see for example Nanex’s FlashCrashSummary , showing the delayed drop and recovery of the Dow.


Tagninju is an ergonomic tag-centric Time and Cost tracking tool.

Its very simple to use and extremely flexible :

  • use keyword tags to organise and group entries
  • rather than type a long description, keep it terse, with tags
  • tags suggest : topic, person, subject, code, urgency, project, specifics
  • inbuilt auto clock timer with pause and continue
  • set todos into the future quickly
  • provides rapid insight into costs and time


TagNinju app is handy for :

  • gain real insight into your time usage [ procrastination or productivity ]
  • keep yourself motivated with objective progress notes
  • track progress towards a goal like weight loss, fitness, savings
  • budgeting : shopping list and track expenses quickly
  • track hours worked on jobs for freelancers and contractors
  • HTML5 local storage means works on mobile tablet or web


I noticed a lot of time dissapeared on unimportant things, and my productivity and motivation was inconsistent. Some days I would feel motivated and get a lot done, other days Id languish and could not really recall what I spent time on or what goals had been achieved.

I looked for a good app, but the note takers were too general, and a spreadsheet was not easy to use on mobile, many time tracker systems were slow or overkill. Eventually I started working on a simple web app to do this. I added an automated timer clock, and this at least allowed me to note down tasks. I started off writing readable sentence notes, but migrated to a terse form with just keywords. I added a simple filter search and found that worked well : I could zoom into areas of interest by keyword, such as shopping or weight or fee or run or billable.

I gradually noticed a useful psychological side-effect of tracking my time : I wanted to get stuff done just so that I could note it down! Id catch myself thinking.. Hmm I havent put in any entries for exercise the last 2 days, I better do some situps or a run so I can put that in.
So actually noting things down had the effect of pushing me to get more good things done.
Theres a kind of objectivity, and also the effect of your motivated self observing your unmotivated self and kind of comparing the two.

I recently heard a talk by memory expert Josh Foer, about autopilot subconscious plateaus. This is where your activity matches your expectation set point at the OK level and so your conscious mind doesnt need to be engaged. We do this everyday for routine tasks, we perform them uncounsciously or rather subconsciously, on auto-pilot.

Autopilot is great for things we dont want to improve : washing dishes, shaving, driving to work.
For things we do want to improve, we need some way of making the activity stand out to our conscious minds so we can evaluate performance and goals and try a different strategy. Time tracking seems to help with this, it makes you review what youve just been doing.

Im not recommending tracking everything, in fact I think its good to decide to not track some areas, maybe the weekends or reading time in the evening. But it can give real insight and motivation for those areas we want to improve : lose weight, gain fitness, reduce cigarettes, study more effectively etc.


You might find these tips handy :

  • tap timer to pause the clock and restart it
  • tap and drag out from timer box to select a time period
  • use meaningful tags, use multiple tags
  • use quick filter search to zoom into that topic or area

Coming soon

These are some future features which might make Tagninju more useful :

  • sync between versions [ entries shared across devices : mobile, web and tablet ]
  • dropbox integration
  • csv import / export [ to and from spreadsheet ]
  • bar charts to show time and cost spent by tag
  • shared groups

Some of these features will be more useful to professionals, small business founders and freelancers and will be released as part of the upcoming paid version : TagNinju Pro.


Im interested to see how people use TagNinju for tracking time and cost, for achieving goals and personal growth.

Please do email me with your suggestions and feedback at

Guest post by my son, after a wee bit of prompting from dad to make a diagram and work it out :




Get every new post delivered to your Inbox.

Join 36 other followers