You are currently browsing the tag archive for the ‘development’ tag.

Setting the scene

  • the database is 30 Million rows and performance is getting slower
  • users want to search for stuff.. but don’t know what they want to search for
  • the web experience needs to be fast enough to feel “interactive
  • it needs to have an api which the mobile app developers can use.. so json

Questions / Observations

  • Do we even have ‘Big Data’ ?
    • 9GB data as CSV
    • 2.5GB when zipped
  • We could actually fit the data all into 16GB RAM ..
    • why doesn’t the database do that ?
  • What if we fully “invert” the data, so tag searches are fast :
    • data  : id -> tags   [ in named fields ]
    • index : tag -> ids
    • “inverted” when all tags are indexed
    • so, given a tag we can quickly get all the record ids it appears in

Stategy / Plan

  • RAM is fast, use plenty : enough for :
    • cached rows and
    • the full index
  • keep the index fully in RAM : all the time
  • vertical scaling might work

The pain, oh the pain

  • cassandra OK
    • but we don’t have a cluster.. + its 150MB java bloat/dependencies install
  • mysql innodb OK ..
    • but weak sql support + a large csv import died
  • redis OK .. but
    • hit 25GB RAM before dying
    • too much space to store long sets of integers
  • new respect for postgres :
    • great sql support
    • csv import “copy” : nice, simple, fast, reliable
    • up the buffers, and warm the cache with some queries
      • will bring in linux virtual ram, based on most recently used
      • directive to hold index or table in RAM would be nice
  • linux utils, ftw :
    • head tail sed split sort uniq wc
    • csvkit
    • perl -pe ‘s///g’ [ and regexes generally in vim, node.js, perl ]
    • bash shell

The epiphany

  • keep the fully inverted index in RAM, all the time
  • save space with a binary format
  • develop custom “index server”
    • custom binary file format
    • node.js is nice to work with : buffers, streams

Partial Enlightenment

  • SSDs are incredible
    • seek SSD ~0.1ms vs  HDD ~10ms  [ 100x ]
    • read SSD  150MB/s vs ~500MB/s [ 4x ]
    • readable comparison here
  • a middle way :
    • custom index held in RAM
    • data in a sturdy RDB [ postgres ]
    • trust SSD performance
  • json result set compresses well
    • 256 rows ~ 170k json ~ 25k gzipped [ 7x ! ]

Good Performance Numbers

  • running on a linode host with 2GB RAM
    • 1GB for postgres cache
    • 1GB for index tag table :
      • first 700 MB of the 2.3 GB binary index file
      • rest is blocks of ids [ per each tag ]
  • tag search on 30 Million records :
    • 25ms to get the first 1000 ids from the index table and index file
    • 150ms to fetch the records with those ids !
  • web front end :
    • search is fast enough to be “interactive
    • user sees 450ms roundtrip [ including 200ms ping ]
    • gzip the json brings 2.3s fetch down to 0.45s
    • feels nearly instantaneous

Moral of the story

  • SSDs are FAST.. but they are NEW and DB software will need time to catch up
  • NoSQL vs SQL just means more options, and more hybrid approaches
  • You can still scale vertical
    • if your data is big [100M rows | 50GB ]
    • but not BIG big [ 1Bn rows | 1TB ]
  • RAM is expensive on the server
    • because hosting companies can share it over many users
    • 64GB linode = $ 640/m vs 2GB linode = $20/mo
    • potentially save $600 per month
  • we will see more soft realtime web sites .. web with data will feel more interactive

Introduction

Tagninju is an ergonomic tag-centric Time and Cost tracking tool.

Its very simple to use and extremely flexible :

  • use keyword tags to organise and group entries
  • rather than type a long description, keep it terse, with tags
  • tags suggest : topic, person, subject, code, urgency, project, specifics
  • inbuilt auto clock timer with pause and continue
  • set todos into the future quickly
  • provides rapid insight into costs and time

Uses

TagNinju app is handy for :

  • gain real insight into your time usage [ procrastination or productivity ]
  • keep yourself motivated with objective progress notes
  • track progress towards a goal like weight loss, fitness, savings
  • budgeting : shopping list and track expenses quickly
  • track hours worked on jobs for freelancers and contractors
  • HTML5 local storage means works on mobile tablet or web

Story

I noticed a lot of time dissapeared on unimportant things, and my productivity and motivation was inconsistent. Some days I would feel motivated and get a lot done, other days Id languish and could not really recall what I spent time on or what goals had been achieved.

I looked for a good app, but the note takers were too general, and a spreadsheet was not easy to use on mobile, many time tracker systems were slow or overkill. Eventually I started working on a simple web app to do this. I added an automated timer clock, and this at least allowed me to note down tasks. I started off writing readable sentence notes, but migrated to a terse form with just keywords. I added a simple filter search and found that worked well : I could zoom into areas of interest by keyword, such as shopping or weight or fee or run or billable.

I gradually noticed a useful psychological side-effect of tracking my time : I wanted to get stuff done just so that I could note it down! Id catch myself thinking.. Hmm I havent put in any entries for exercise the last 2 days, I better do some situps or a run so I can put that in.
So actually noting things down had the effect of pushing me to get more good things done.
Theres a kind of objectivity, and also the effect of your motivated self observing your unmotivated self and kind of comparing the two.

I recently heard a talk by memory expert Josh Foer, about autopilot subconscious plateaus. This is where your activity matches your expectation set point at the OK level and so your conscious mind doesnt need to be engaged. We do this everyday for routine tasks, we perform them uncounsciously or rather subconsciously, on auto-pilot.

Autopilot is great for things we dont want to improve : washing dishes, shaving, driving to work.
For things we do want to improve, we need some way of making the activity stand out to our conscious minds so we can evaluate performance and goals and try a different strategy. Time tracking seems to help with this, it makes you review what youve just been doing.

Im not recommending tracking everything, in fact I think its good to decide to not track some areas, maybe the weekends or reading time in the evening. But it can give real insight and motivation for those areas we want to improve : lose weight, gain fitness, reduce cigarettes, study more effectively etc.

Tips

You might find these tips handy :

  • tap timer to pause the clock and restart it
  • tap and drag out from timer box to select a time period
  • use meaningful tags, use multiple tags
  • use quick filter search to zoom into that topic or area

Coming soon

These are some future features which might make Tagninju more useful :

  • sync between versions [ entries shared across devices : mobile, web and tablet ]
  • dropbox integration
  • csv import / export [ to and from spreadsheet ]
  • bar charts to show time and cost spent by tag
  • shared groups

Some of these features will be more useful to professionals, small business founders and freelancers and will be released as part of the upcoming paid version : TagNinju Pro.

Suggestions

Im interested to see how people use TagNinju for tracking time and cost, for achieving goals and personal growth.

Please do email me with your suggestions and feedback at justgord@gmail.com

Lately I seem to be busy mainly with consulting to Internet and Mobile startups, and one of the things I get asked a lot is which technologies I recommend for a given project.

It seems to work best when I give two approaches to serving data, one based on newer tools such as Node.js and CouchDB along with a more conservative alternative using Ruby on Rails or traditional LAMP stack as underlying technology.

As a prelude to the detailed project plan and data design, I usually give an overview of current trends.. I get asked that a lot, so I thought Id put my observations online here as a few bullet points :

Technology Trends :

  • Web and Mobile projects are converging [one usually implies the other]
  • Most data is social data [for the user and the people they interact with]
  • Most data naturally fits a tree or graph structure [tabular, not so much]
  • JSON has replaced XML [hand-in-glove with web, iPhone, Android ]
  • Scalability is a feature [can be deferred, but no easy migration path]
  • The days of generating HTML from PHP are gone [use JSON feed + jQuery]
  • People still use PHP and MySQL [it works, you can find developers]
  • Code using MVC frameworks tends to be overly verbose [Zend, Rails, Django]
  • Node.js and CouchDB are cool [but the newness of NoSQL is risky]

These are universally accepted as a good thing :

  • iPhone vs Android : both are good, competition is great!
  • JSON
  • jQuery
  • Architecture :  [ server data store ] <–> [ JSON ] <–> [ Web / Mobile client ]
  • AppStore revenue model

These are technologies to watch for :

  • Node.js [ultra high performance app servers can be written in Javascript ]
  • CouchDB [scalable NoSQL with sane REST api, map/reduce in Javascript ]
  • SVG [allows new kinds of User Interface on web & mobile ]

The above observations and predictions raise some interesting questions.. but Ill save that as juice for another post.