You are currently browsing the tag archive for the ‘performance’ tag.

For experimenting with some ideas on improving blockchain scaling :

Currently simulates the growth of a crypto-currency blockchain ( similar to Bitcoin ), loops thru a cycle of :

  • spend – pick addresses at random and spend to another address
  • mine – put waiting transactions into the block, mine the cryptographic hash, add to chain

Features / Limitations :

  • uses single sha256 bit hash for ids
  • uses an easy Proof-of-work [ 256 tries on average to get a block hashs with ’00’ leading bytes ]
  • uses node.js byte buffers for transactions and blocks
  • runs around 7000 tps on i5 laptop


I wanted to simulate the growth of a blockchain with unspent transactions spread somewhat sparsely at the early older parts of the blockchain, and more dense at the top of the blockchain [as more recent transactions havent had time to be spent yet ].

The reason for this is to test the feasibility of reducing the size of the data needed to bootstrap a new node. eg. in Bitcoin the whole dataset is :

  • around 150GB of transactions [ 250Mn txns ]
  • utxo of around 2GB [ ~50Mn txns ]
  • so unspent ‘utxo’ set is around 20% of transactions

Bring UTXO set forward

We can use much less data [5x smaller ] when spinning up a new node, by bringing utxo set forward to nearer the front of the chain. The sim gathers old utxos and injects them into the blockchain in baches of ids, so they are stamped into the block at block creation.

These ‘utxo catchup sections’ are read when starting a new processing node – ie. it only needs a provable list of utxos, not the complete history of all spent transactions.

skip links [ todo ]

Using block extension areas, we can also include skip links to blocks much earlier in the chain – the process of walking back thru the links of transactions inputs and outputs all the way back to the initial ‘genesis’ block is like walking a DAG tree, as un-needed areas of the blockchain are skipped over.  These skip links are validated at block creation time by other nodes.

This is useful for clients which want to traverse the chain to make better proof of validity that SPV, and for nodes that use utxo bring forward above, so they can trace PoW to an arbitrary level back to the genesis block.

Been ultra busy lately on two distinct web projects, both of which really need a good architecture so they can scale easily.

One nice way to do this is with services which message each other – if you have a fast, persistent reliable message queue you can have many processes grab jobs off the queue and thus scale out over many cores.  I feel this is a natural way to scale out node.js apps.

I immediately discarded SQS [ Amazon’s scalable queue service ] as it basically polls a web url to check for messages, so it really is not a message queue at all.

Another nice option is Mongo tailable cursors, which is the message queue approach that mongo uses internally to replicate between mongo instances.   This worked sort of ok, but didnt strike me as ultra high performance.  see mongoMQ npm module for a simple api wrapper.

I was very impressed with Redis Publish/Subscribe semantics… which are very fast, but dont have persistence.

In the end I decided to use a combination of Redis pub/sub for notifications, and Redis List operations rpush/lpop to store/retrieve the actual message data.  I wrapped this in a simple node.js api, which Im calling redpill.

I have an initial implementation of a typical web app which has users and info items.  This performs rather well, with web request response times under 50ms whle 1000 items/sec are being inserted as a background task.

The nice thing is that using message queues allows you to scale up by running any number of servers.

See code + comments here :

redpill : persistent messaging using Redis primitives

gorgon : demo web server architecture with messaging between server components

Lots of things to optimise, but basically a good proof of concept and initial working demo code for this approach.

My previous blog articles describe approaches to calculating kth moments in a single pass – see compute kth central moments in one pass and variance of a sequence in one pass.

I decided to make a rough performance comparison between my approach and boost accumulators api.  A reasonable test is to procedurally generate 10^7 random numbers in range [0.0..1.0] as the input data.  We compare 3 algorithms –

  • No moments accumulated – baseline, just generates the input
  • Accumulate all 12th order moments using Boost Accumulators
  • Accumulate all 12th order moments using my vfuncs cpp code

We run on Linux using time command to get rough timings, raw results are –

  • Baseline   –   1.16s
  • Boost Acc – 23.16s
  • Vfunc Acc –  2.44s

If we subtract the baseline we get a rough comparison of Boost ~ 22s and Vfuncs cpp ~1.3s when the cost of generating the input is factored out.

So the vfuncs impl is roughly an order of magnitude faster then the current v1.37.0 boost accumulate stats implementation (both are single pass).

I don’t think the boost algorithm is logically different, its probably more a case of template code complexity having an impact – the 10s compile times might indicate this also.  Executable size also reflects this, differing by an order of magnitude.

(Note : Using gcc 4.2.4.  I haven’t tried this on other compilers, build settings etc – they could have a profound effect.  Let me know if you see different results on eg. Intel or VC compilers)


Download code and project – kth_moments.tgz – from vfuncs google code downloads page.  [BSD license feel free to use]

Continuing on the same topic as my previous post, its nice to be able to gather all the kth order moments in a single pass.

Last time I mentioned the boost/accumulators example, but you will have noticed two issues if you use that.  Firstly, moment<k> tag will give you the kth simple moment relative to zero, whereas we often want the kth central moment of a sequence relative to the mean.  Secondly, although boosts accumulator is well written it does seem to take a while to compile [~ 12 seconds for code using moment<12>].

After some playing around Ive got a faster simpler approach, where the inner loop accumulates kth powers of the element.  After you’ve run the sequence through, you can then easily extract variance, and all the kth central moments.  So in adding the more general case of kth moments, Ive made the particular variance case simpler.  That often seems to happen in programming and math!


First a bit of math and then the code.  We want to express the kth central moment in terms of the k basic moments.

First, lets define the basic moment as –

\displaystyle M_{n}^{j}= \sum_{i=1}^n {x}_i^{j}

We rearrange the binomial expansion –

\displaystyle nv_{n}^{k}= \sum_{i=1}^n({x}_{i}-\mu_{n})^k

\displaystyle = \sum_{i=1}^n \sum_{j=0}^k \binom{k}{j} {x}_{i}^j(-\mu_{n})^{k-j}

\displaystyle = \sum_{j=0}^k \binom{k}{j} (-\mu_{n})^{k-j} \sum_{i=1}^n {x}_{i}^j

So we have the kth central moment given as a weighted sum of the kth simple moments –

\displaystyle v_{n}^{k} = 1/n(\sum_{j=0}^k \binom{k}{j} (-\mu_{n})^{k-j} M_{n}^{j})

which shows that all we need to accumulate as we walk across the sequence is the kth simple powers ({x}_{i})^k .

Notice the variance is now handled as a special case where k=2.  Likewise k=0 corresponds to n, the element count and k=1 is the sum of elements.

c++ impl

Heres a basic impl of the above expression –

Read the rest of this entry »


I was chatting with some other quant developers the other day, as you do, and the issue came up of calculating variances.  Its a pretty common operation, and the naive approach does two passes, one to calculate the mean, \mu_{n} , and one to gather the squares of differences from that. … but someone asked if it could be done in one pass, and of course it can, quite easily.

Recall, the population variance of a sequence \left \{x_{i} \right \} is defined as –
v_{n} = (\sum_{i=1}^{n} (x_{i}-\mu_{n} )^2) / n

We can expand this and see how the variance after n terms differs from the variance after m=n+1 terms, vis –

nv_{n}= \sum_{i=1}^{n} (x_{i}-\mu_{m} + \mu_{m} - \mu_{n})^2
\displaystyle = \sum_{i=1}^{n} (x_{i}-\mu_{m})^2 + \sum_{i=1}^{n}(\mu_{m} - \mu_{n})^2 + 2 \sum_{i=1}^{n}(x_{i}-\mu_{m}) (\mu_{m} - \mu_{n})
\displaystyle = mv_{m}- (x_{m}-\mu_{m})^2 + n(\mu_{m} - \mu_{n})^2 + 2 (\mu_{m} - \mu_{n}) \sum_{i=1}^{n}(x_{i}-\mu_{m})
\displaystyle = mv_{m}- (x_{m}-\mu_{m})^2 + n(\mu_{m} - \mu_{n})^2 - 2n(\mu_{m} - \mu_{n})^2
\displaystyle = mv_{m}- (x_{m}-\mu_{m})^2 - n(\mu_{m} - \mu_{n})^2

and we have v_{m} in terms of v_{n} vis –

v_{m} = ( nv_{n} + (x_{m}-\mu_{m})^2 + n(\mu_{m} - \mu_{n})^2 ) / m

In other words v_{m}= f(v_{n}, sum_{n}, n, x_{m}) , so theres very little we need to store to accumulate the variance as we traverse the sequence.

C++ Implementation

Expressing this in C++ code we’d have a functor that maintains state and gets handed each element of the sequence, in simplest form –

Read the rest of this entry »