Thursday, September 22, 2011

Some PiCloud tests...

I'm using their PI example. The Performance gains are great and when you see the code below you will realize that PICloud is really easy and intuitive to use. I'll be moving some of my python jobs to them.
0.0 0.0160000324249 0.047000169754 0.483999967575 4.64100003242 46.5150001049
ProcessLocationNumber of TestsNumber in ParallelWall Clock Time (sec)Pi
calcPiLocallocal11020.003.16000000
calcPiCloudcloud810230.373.04000000
calcPiLocallocal11030.003.13200000
calcPiCloudcloud81035.313.08000000
calcPiLocallocal11040.023.13640000
calcPiCloudcloud81044.313.13200000
calcPiLocallocal11050.053.13664000
calcPiCloudcloud81051.223.13840000
calcPiLocallocal11060.483.14185200
calcPiCloudcloud81062.303.14116000
calcPiLocallocal11074.643.14092240
calcPiCloudcloud81072.313.14099920
calcPiLocallocal110846.523.14138168
calcPiCloudcloud81088.503.14139276
calcPiLocallocal1109 468.45 3.14171378
calcPiCloudcloud8109 121.48 3.14159943




import cloud, random, time
cloud.setkey(your private key, 'your private key goes here')




def monteCarlo(num_test):
    """
    Throw num_test darts at a square
    Return how many appear within the quarter circle
    """
    numInCircle = 0
    for _ in xrange(num_test):
        x = random.random()
        y = random.random()
        if x*x + y*y < 1.0:  #within the quarter circle
            numInCircle += 1
    return numInCircle

def calcPiLocal(n):
    numTests = 10**n
    tick = time.time()
    numInCircle = monteCarlo(numTests)
    pi = (4 * numInCircle) / float(numTests)
    return 'calcPiLocal','local', '2', n, time.time() - tick, pi
  
def calcPiCloud(n, num_parallel = 8):
    numTests = 10**n
    tick = time.time()
    testsPerCall = numTests/num_parallel
    jids = cloud.map(monteCarlo,[testsPerCall]*num_parallel, _type='c2')
    numInCircleList = cloud.result(jids)
    numInCircle = sum(numInCircleList)
    pi = (4 * numInCircle) / float(numTests)
    return 'calcPiCloud','cloud', num_parallel, n, time.time() - tick, pi
  
  
if __name__ == '__main__':
    for n in range(2,9):
        for f in calcPiLocal, calcPiCloud:
            print f(n)

Monday, February 7, 2011

Serializion Performance

Last week  I stuck my head out  in a meeting and declared that XML is verbose and slow to parse and that we should move to something like Google's protocols buffers,  or something readable such as json or YAML, which are  easier to parse etc etc etc! Well is this really true ? The statement seems logical considering how verbose XML can be. Still, after the meeting, some questions stayed in my mind. So I thought I would do some tests. I used  a FIX Globex (CME) swap trade confirmation message to test my theory.


Sizefrom Pythonto Python
jsoncjson23320.2222380638120.0943419933319
picklecPickle17780.2335181236270.128826141357
XMLcElementTree20830.4077069759372.77832698822
jsonsimplejson23323.377236127855.11316084862






So this simple test shows that using XML with cElementTree parser  is not so slow, cjson wins in speed and the conclusion must be: Your performance will ultimately depend on your data and the quality of the libraries you have available.

I'll try  to continue with these tests and maybe find  a better YAML library.