ByteEasy: 2011

I'm using their PI example. The Performance gains are great and when you see the code below you will realize that PICloud is really easy and intuitive to use. I'll be moving some of my python jobs to them.
0.0 0.0160000324249 0.047000169754 0.483999967575 4.64100003242 46.5150001049

Process	Location	Number of Tests	Number in Parallel	Wall Clock Time (sec)	Pi
calcPiLocal	local	1	10²	0.00	3.16000000
calcPiCloud	cloud	8	10²	30.37	3.04000000
calcPiLocal	local	1	10³	0.00	3.13200000
calcPiCloud	cloud	8	10³	5.31	3.08000000
calcPiLocal	local	1	10⁴	0.02	3.13640000
calcPiCloud	cloud	8	10⁴	4.31	3.13200000
calcPiLocal	local	1	10⁵	0.05	3.13664000
calcPiCloud	cloud	8	10⁵	1.22	3.13840000
calcPiLocal	local	1	10⁶	0.48	3.14185200
calcPiCloud	cloud	8	10⁶	2.30	3.14116000
calcPiLocal	local	1	10⁷	4.64	3.14092240
calcPiCloud	cloud	8	10⁷	2.31	3.14099920
calcPiLocal	local	1	10⁸	46.52	3.14138168
calcPiCloud	cloud	8	10⁸	8.50	3.14139276
calcPiLocal	local	1	10⁹	468.45	3.14171378
calcPiCloud	cloud	8	10⁹	121.48	3.14159943

import cloud, random, time
cloud.setkey(your private key, 'your private key goes here')




def monteCarlo(num_test):
    """
    Throw num_test darts at a square
    Return how many appear within the quarter circle
    """
    numInCircle = 0
    for _ in xrange(num_test):
        x = random.random()
        y = random.random()
        if x*x + y*y < 1.0:  #within the quarter circle
            numInCircle += 1
    return numInCircle

def calcPiLocal(n):
    numTests = 10**n
    tick = time.time()
    numInCircle = monteCarlo(numTests)
    pi = (4 * numInCircle) / float(numTests)
    return 'calcPiLocal','local', '2', n, time.time() - tick, pi
  
def calcPiCloud(n, num_parallel = 8):
    numTests = 10**n
    tick = time.time()
    testsPerCall = numTests/num_parallel
    jids = cloud.map(monteCarlo,[testsPerCall]*num_parallel, _type='c2')
    numInCircleList = cloud.result(jids)
    numInCircle = sum(numInCircleList)
    pi = (4 * numInCircle) / float(numTests)
    return 'calcPiCloud','cloud', num_parallel, n, time.time() - tick, pi
  
  
if __name__ == '__main__':
    for n in range(2,9):
        for f in calcPiLocal, calcPiCloud:
            print f(n)

Last week I stuck my head out in a meeting and declared that XML is verbose and slow to parse and that we should move to something like Google's protocols buffers, or something readable such as json or YAML, which are easier to parse etc etc etc! Well is this really true ? The statement seems logical considering how verbose XML can be. Still, after the meeting, some questions stayed in my mind. So I thought I would do some tests. I used a FIX Globex (CME) swap trade confirmation message to test my theory.

		Size	from Python	to Python
json	cjson	2332	0.222238063812	0.0943419933319
pickle	cPickle	1778	0.233518123627	0.128826141357
XML	cElementTree	2083	0.407706975937	2.77832698822
json	simplejson	2332	3.37723612785	5.11316084862

So this simple test shows that using XML with cElementTree parser is not so slow, cjson wins in speed and the conclusion must be: Your performance will ultimately depend on your data and the quality of the libraries you have available.

I'll try to continue with these tests and maybe find a better YAML library.

ByteEasy

Thursday, September 22, 2011

Some PiCloud tests...

Monday, February 7, 2011

Serializion Performance