For experimenting with some ideas on improving blockchain scaling :

https://github.com/justgord/blksimjs

Currently simulates the growth of a crypto-currency blockchain ( similar to Bitcoin ), loops thru a cycle of :

  • spend – pick addresses at random and spend to another address
  • mine – put waiting transactions into the block, mine the cryptographic hash, add to chain

Features / Limitations :

  • uses single sha256 bit hash for ids
  • uses an easy Proof-of-work [ 256 tries on average to get a block hashs with ’00’ leading bytes ]
  • uses node.js byte buffers for transactions and blocks
  • runs around 7000 tps on i5 laptop

Motivation

I wanted to simulate the growth of a blockchain with unspent transactions spread somewhat sparsely at the early older parts of the blockchain, and more dense at the top of the blockchain [as more recent transactions havent had time to be spent yet ].

The reason for this is to test the feasibility of reducing the size of the data needed to bootstrap a new node. eg. in Bitcoin the whole dataset is :

  • around 150GB of transactions [ 250Mn txns ]
  • utxo of around 2GB [ ~50Mn txns ]
  • so unspent ‘utxo’ set is around 20% of transactions

Bring UTXO set forward

We can use much less data [5x smaller ] when spinning up a new node, by bringing utxo set forward to nearer the front of the chain. The sim gathers old utxos and injects them into the blockchain in baches of ids, so they are stamped into the block at block creation.

These ‘utxo catchup sections’ are read when starting a new processing node – ie. it only needs a provable list of utxos, not the complete history of all spent transactions.

skip links [ todo ]

Using block extension areas, we can also include skip links to blocks much earlier in the chain – the process of walking back thru the links of transactions inputs and outputs all the way back to the initial ‘genesis’ block is like walking a DAG tree, as un-needed areas of the blockchain are skipped over.  These skip links are validated at block creation time by other nodes.

This is useful for clients which want to traverse the chain to make better proof of validity that SPV, and for nodes that use utxo bring forward above, so they can trace PoW to an arbitrary level back to the genesis block.

 

 

Bitcoin is a work of genius .. it is a world-changing technology revolution. But being the first incarnation of a new new technology, there are some things that can be improved upon :

  • time to next transaction
  • time for transactions to clear [ unpredictable processing times ]
  • transaction fees
  • energy efficiency

For now, lets just look at one issue – Currently in bitcoin, transactions are throttled by money supply/proof-of-work, and this is a bottleneck on transaction throughput. The fact is they don’t need to be, they can be decoupled.

Some eye opening stats :

[ last 3 images from woobull.com ]

But.. we can actually have our cake and eat it, when it comes to block size. Its not that hard a technical problem to solve :

Dynamic block size [ decoupled block size from money supply ]

The mining reward should be given purely for PoW / solving the next block hash / making the next “time-stamp” – it should not be a throttle on transaction volume. It is a throttle on transaction volume currently, because the block size is fixed per solve.

Currently we have blocks that are 15% full, and blocks that are 99% full – the block size is not the issue, the issue is the block size is fixed.

Read the rest of this entry »

I just read some good insights on the Aussie startup scene downunder, from Airtree VC partner John Henderson  :

http://www.businessinsider.com.au/a-top-executive-explains-whats-missing-from-australias-tech-startup-scene-2017-2

My own take on this is  –  RECYCLE talent, more.

We somehow need to reach a kind of critical mass of local success, so we have a sustainable “pyramid” of biz / tech / investment talent.

In Silicon Valley and even Berlin, as soon as startup X tanks [or exits], the people who worked there – devs, sysops, seo marketers, finance, bizdev, CXOs, managers, investors, scientists – move on to other startups and so are “recycled” back into the pyramid of special startup knowledge, experience and talent.

In startups, this is a vastly more important process than in normal business, because the tech and biz approach of early high-growth startups is fairly unique, and because its well known that all the economic benefit comes from the few successes – so there will be a lot of failures, and it would be really costly to waste all that experience / investment in time.   If people with startup experience move on to ‘normal’ business environments, instead of being able to move into another startup, its a massive loss.

One side effect of not having this self-replenishing Pyramid, is knowing both sides – techs who know biz, and entrepreneurs that know some code, founders who can wear both hats.   Aussie founders often have a view of tech as purely a cost center –  a pain point to outsource, a necessary evil – rather they should see it as core business, an area to innovate, to generate business ideas, a channel to the customer, a means to delight users, and a way to project power at scale.  I really think this is holding us back.

To get to this Pyramid, one thing we have to do is find more efficient ways to recycle talent – just get people to flow into the next thing, instead of a deep dive of depression and naval gazing about how we did the wrong thing and that’s why it tanked.  Do the painful postmortem blog post, learn and move on.. heal while your working on the next thing.  Recycle what you can – code as open source, your team into an acqui-hire or other startups via intros to people you know, even competitors.  Slava Akhmechet, founder of RethinkDB,  was totally classy in the way he did this.

There’s a scene in one of those bad Vin Diesel movies, where Vins getting stared down by some bigger punk and he says “500” … pause … punk asks “500 whadd ?” .. Diesel retorts “500 fights.. it takes 500 street fights to learn the craft”  .. or something to that effect.

Maybe we need to wear our failed startups as a sign of pride, a tattoo to show off .. because a failed startup is going to teach a developer or entrepreneur an incredible amount in a short space of time – its the perfect learning environment, where you are engaged, get to use cool stuff, change hats, have mutable roles, and are challenged beyond your comfort zone on a daily basis.

We could be getting close in Melbourne and Sydney to that magic number, be it 500 or otherwise, where we have enough of a pyramid of talent to fuel a viable startup ecosystem.  recycle, dammit.

Education in Victoria is succeeding in some areas, but failing in many others – we have new school buildings, but they are overflowing and the school rolls climbing so quickly that teachers and principals have no bandwidth left for improving educational outcomes.  Schools are adapting to technology, but failing to handle the wide range of ability and rates of learning our kids have. The system is not flexible enough to handle the needs of low achievers and high achievers in specific areas.

Every couple of months there is a new study showing how badly Australia is doing compared to other countries in areas such as Math.  We know there are approaches that have worked elsewhere but we seem unwilling or unable to change and adopt them.

Homeschooling is an important right for Victorians – in many cases it is the only way to solve problems with bullying, with low achieving students and with high achieving students.  Homeschooling is a rising demographic which serves as an important barometer of how well our schools are serving students and parents.

If the government understands this, then it will understand the value in Homeschooling, and will preserve that  right as a legal option, and keep the current registration regulations intact.   Homeschooling also serves the Dept of Education – it relieves pressure form a strained system, and gives a flexible way of educating students who are not well served by schools.  It is part of the solution, not part of the problem.

It is so important that the Department of Education _listen_ to Homeschoolers, not try to tell them how to educate, or punish them – rather use it as important feedback.  I was surprised to find many ex-teachers among Homeschool parents, and other parents had studied education theories in some depth.  Homeschoolers as a rule are those who value education highly – they are pro-education, not anti-education.

I can say personally, it is heart rending to make the decision to take your child from school, you would only do it if there was a real problem to solve, or a clear benefit in doing so.

My son has just turned 13yo, he is by some accounts gifted – but the reality is simply that he was read to a lot from an early age, and had some opportunities for books and learning and discussion, and excelled because of normal healthy genes and a supportive environment.   He has attended public schools in inner Melbourne for 3 years, the other years being Homeschooled – so I have some basis on which to compare the good and bad of each approach.

Early this year he was accepted into the SEAL program at a very good new school in inner Melbourne.  The SEAL program is great for many kids because they immediately skip a year and jump ahead closer to their current level.  I’m in favor of the SEAL program, its a good thing – but its not the complete answer.  In my sons case he was repeating material he had done a couple years earlier in Math, so the homework was ‘busy work’.

I tried him out on year 10 questions and he worked through them well, so it seemed he was at that level.  I asked the teacher if he could work ahead in Math, then asked the year coordinator and finally the deputy principal – and was surprised that this request was politely ignored in every case.  At first I was angry, but then I realized that they probably just saw my request as “more work” for them, and they are already straining to keep up with massive expansion in student numbers.   The roll is growing at a massive rate, and I think this is why they just don’t have bandwidth to gather a real focus on learning outcomes, let alone catering flexibly to students who fall outside the norm.

As an aside, there are ways to teach and learn math that are vastly better for all students than the approach we have in most Australian schools now.  You don’t have to invent new methods, they are tried and work well overseas – you can read about Jo Boaler, ProofSchool, MathCircles, KhanAcademy, AoPS.com, Australian Math Competition etc. You can read any review of our current math texts by university mathematicians, or look at any comparison study with other countries to know we are doing it badly.   The system needs to be flexible enough to accommodate and experiment with these new methods.  Its not the curricula per-se, it is the way its communicated – it is not visual enough, it is too topic-centric and should be more problem-centric, it is not interactively explored.

Id like to see schools adopt these approaches – but right now they are too busy handling roll growth alone, and in moving from paper books to ipads.

This means the only solution, for now, is to Homeschool your child if they excel in Math – school is a hostile environment towards learning math deeply.

We need to change the way we think about Homeschooling – it is valuable for mainstream education in Australia, it is a place to see how new methods work and take the needed risks in new approaches to learning.  It is a pressure valve for a school system experiencing the stress of rapid growth, and it is the only way to accommodate that small minority of students who will not excel at schools, no matter how good those schools become in future.

To this end I propose that the Victorian Government / Department of Education Victoria consider supporting Homeschooling in the following practical ways :

  • Preserve the current lite-touch Homeschool registration regulations in Victoria [ realise that making regulations tighter will likely result in mass non-registration ]
  • Fund a fulltime Homeschool liaison specialist educator in DETV  [ to support homeschoolers, not police them ! ]
  • Establish an open registry of public school events that Homeschoolers can join in with
  • Fund several masters/phd opt-in studies on Homeschool education approaches and attainments
  • Fund Math and Science specific programs for both schools and homeschoolers, eg: [ alternative curriculum materials, such as AoPS.com books,  Math Circles and Robot workshops ]
  • Establish a yearly tax deduction for extra costs associated with homeschooling your child [ taken from the money that homeschooling saves the government on schooling ]

Just made a couple of videos explaining multiplication in a visual way, using the ‘Box Method’

Playlist on youtube, here

gridmaths_mult_36x27_calcs

A MathCircle is where a group of young people get together and work on some interesting Math-related topics introduced by a mentor.

The topics are sometimes problems like you find on ArtOfProblemSolving.com or Math Competitions [ vis AMC Sample Questions ].  They can cover quirky topics that aren’t normally seen in the school curriculum, such as Catalan Numbers.

The idea sounds geeky, and is un-apologetically so, yet has the potential to engage students who might be bored with the traditional material.   It preps interested students for careers in science / finance / medicine / engineering and helps the school do well in math and science competitions.  Math Circle meets can also be a lot of fun.

Why

Question : “Why do we need more math.. we already do that in school, right?”

Answer : “yeah..but you don’t learn enough Tennis in PE class to become a club player, and you don’t get enough Instrument practice in Music class to get into music school .. you really need to train, to play in an ensemble and have dedicated practice for that.”

MathCircle takes the same approach of intensive practice which you find in Basketball practice, Dance club, Swim meet or musical instrument group … except focused on math-related skills.

Practice with Duration + Intensity

One important aspect that both Math problem solving and programming share is the sustained concentration on one task – in my opinion we have gone too far in the direction of byte size chunks of learning.  Sometimes you need to chew on things for a while.

We can look at how Basketball and Music are learnt, and apply what works to math.   Training sessions often last for 2 hours, and this is accepted as a social norm – it is well understood it takes time to get into the zone.. repeat the basics, introduce a new skill, practice it, and integrate and perfect it over time thru a range of scenarios.

Its also not about the one or two elite players – the whole team improves and transfer skills around and offer peer-support thru the shared activity.  It can help develop individual qualities that are useful for success in other areas of life, namely ‘character’.

With Code

MathCircle is something most parents haven’t heard of .. but teaching young people how to program is an easy sell – it has had a lot of positive marketing over the last year or so, and its a clear pathway to a good salary.

I also think that writing small programs is a great way to introduce and discover math ideas .. its tactile, interactive, hands on, iterative, experimental.  Your also working with the real concepts – a lot of educational apps and games seem to win in terms of engaging and entertaining, but lose in terms of conveying deeper ideas.  When your making a program you are really tinkering under the hood with the engine, not just zooming around the racetrack.  So a MathCircle with an emphasis on making your own programs to investigate math topics, and using tools like Geogebra, might work really well.

Example Topics

A MathCircle which has a code-things-up emphasis, we could call a MathCodeCircle.

Here are some topics that might be covered in such a MathCodeCircle :

  • find prime number factors, use for solving lcm/gcf problems
  • pong game variant – balls bounce around and collide
  • adding waves together – beats, square waves
  • simulate jumpy stock prices, compare with compound interest
  • planets orbiting / solar system simulator
  • circle inversion using GeoGebra

Most of these would be developed in javascript, and run in the browser – using the canvas api to render 2D graphics – its real programming.

More

Some links if your interested :

 

 

 

Setting the scene

  • the database is 30 Million rows and performance is getting slower
  • users want to search for stuff.. but don’t know what they want to search for
  • the web experience needs to be fast enough to feel “interactive
  • it needs to have an api which the mobile app developers can use.. so json

Questions / Observations

  • Do we even have ‘Big Data’ ?
    • 9GB data as CSV
    • 2.5GB when zipped
  • We could actually fit the data all into 16GB RAM ..
    • why doesn’t the database do that ?
  • What if we fully “invert” the data, so tag searches are fast :
    • data  : id -> tags   [ in named fields ]
    • index : tag -> ids
    • “inverted” when all tags are indexed
    • so, given a tag we can quickly get all the record ids it appears in

Stategy / Plan

  • RAM is fast, use plenty : enough for :
    • cached rows and
    • the full index
  • keep the index fully in RAM : all the time
  • vertical scaling might work

The pain, oh the pain

  • cassandra OK
    • but we don’t have a cluster.. + its 150MB java bloat/dependencies install
  • mysql innodb OK ..
    • but weak sql support + a large csv import died
  • redis OK .. but
    • hit 25GB RAM before dying
    • too much space to store long sets of integers
  • new respect for postgres :
    • great sql support
    • csv import “copy” : nice, simple, fast, reliable
    • up the buffers, and warm the cache with some queries
      • will bring in linux virtual ram, based on most recently used
      • directive to hold index or table in RAM would be nice
  • linux utils, ftw :
    • head tail sed split sort uniq wc
    • csvkit
    • perl -pe ‘s///g’ [ and regexes generally in vim, node.js, perl ]
    • bash shell

The epiphany

  • keep the fully inverted index in RAM, all the time
  • save space with a binary format
  • develop custom “index server”
    • custom binary file format
    • node.js is nice to work with : buffers, streams

Partial Enlightenment

  • SSDs are incredible
    • seek SSD ~0.1ms vs  HDD ~10ms  [ 100x ]
    • read SSD  150MB/s vs ~500MB/s [ 4x ]
    • readable comparison here
  • a middle way :
    • custom index held in RAM
    • data in a sturdy RDB [ postgres ]
    • trust SSD performance
  • json result set compresses well
    • 256 rows ~ 170k json ~ 25k gzipped [ 7x ! ]

Good Performance Numbers

  • running on a linode host with 2GB RAM
    • 1GB for postgres cache
    • 1GB for index tag table :
      • first 700 MB of the 2.3 GB binary index file
      • rest is blocks of ids [ per each tag ]
  • tag search on 30 Million records :
    • 25ms to get the first 1000 ids from the index table and index file
    • 150ms to fetch the records with those ids !
  • web front end :
    • search is fast enough to be “interactive
    • user sees 450ms roundtrip [ including 200ms ping ]
    • gzip the json brings 2.3s fetch down to 0.45s
    • feels nearly instantaneous

Moral of the story

  • SSDs are FAST.. but they are NEW and DB software will need time to catch up
  • NoSQL vs SQL just means more options, and more hybrid approaches
  • You can still scale vertical
    • if your data is big [100M rows | 50GB ]
    • but not BIG big [ 1Bn rows | 1TB ]
  • RAM is expensive on the server
    • because hosting companies can share it over many users
    • 64GB linode = $ 640/m vs 2GB linode = $20/mo
    • potentially save $600 per month
  • we will see more soft realtime web sites .. web with data will feel more interactive

ca_catalog_001

click Melbourne Graffiti item above , drills down to this page :

ca_catalog_002

Handy for nested Product Catalog style apps … or a scrapbook

To build this app in CollabAPI I just defined a Creative item as :

  • blurb text
  • photo
  • more [ a list of Creative items ]

so its like a set of folders … recursion : “its turtles all the way down”

collabapi_infographic_20141108.v3

collabapi_infographic_20141105.v2