Sunday, July 10, 2016

Setting up Python...

Setting up Python.



Install pyenv

1. Download

curl -L https://raw.github.com/yyuu/pyenv-installer/master/bin/pyenv-installer | bash

2. Setup environment variables and initialiaze in your profile.

$ echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bash_profile
$ echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bash_profile
$ echo 'eval "$(pyenv init -)"' >> ~/.bash_profile


3. Restart your shell.

$ exec $SHELL

4. Install your choice of python

pyenv install 3.5.2

5. Set as the current (global) version of python

pyenv global 3.5.2

6. Install the  virtual environment plugin

pyenv-virtualenv


7. create a virtual environment

pyenv virtualenv protovima


8. Activate it.

pyenv virtualenv protovima

NOTE With python 3.5 you should install:

sudo apt-get install curl git-core gcc make zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev libssl-dev





Thursday, December 26, 2013

JSON Performance

I have some code that processes tweets, about 5 million a day, in realtime. They are currently stored in mongodb and also posted on various celery/rabbitmq work queues. The average message size is 5524, so encoding and decoding these messages is an issue.

Using the following test code below.




Standard Tweet message Encode/Decode with python built-in json package.

message size
TestMsg SizeDe-serializeObj SizeSerialize
cjosn, bson, ujson
StorageStoroge Cost
Empty object4114.790, 37.565, 0.970546.341, 41.856, 1.2492050Mb0.21{}
Empty list4115.069, 38.021, 1.005546.675, 41.475, 1.4002050Mb0.21[]
Object of objects843107.750, 145.440, 25.525322663.555, 828.235, 28.05142150Mb4.21
List of lists56358.805, 81.950, 16.96010443.426, 815.965, 18.31128150Mb2.81
Object with only tweet id9325.030, 53.360, 2.28042223.570, 83.445, 3.2954650Mb0.47
Full tweet message4386697.221, 867.780, 188.56012606360.290, 5847.335, 201.610219300Mb21.93https://gist.github.com/thanos/adf7e20b5f00551a38a8
Message with string payload4899446.661, 489.455, 82.3619170104.254, 327.455, 92.460244950Mb24.50
Object where the field names are codes4396422.220, 456.184, 75.035788073.345, 231.071, 81.570219800Mb21.98
Only fields of interest1911285.840, 375.440, 74.1007290147.895, 2252.359, 78.49595550Mb9.55https://gist.github.com/thanos/adf7e20b5f00551a38a8
Only fields of interest, keys encoded1660288.585, 378.330, 67.4007290143.045, 2246.330, 70.94083000Mb8.30
Denormalized Tweet4707603.030, 716.414, 186.3009246350.465, 5802.315, 198.150235350Mb23.54
Denormalized Tweet with only needed fields2158255.200, 318.004, 77.0406378142.004, 2165.425, 71.850107900Mb10.79
Denormalized Tweet with only needed fields, keys encoded1734242.375, 305.970, 65.6896378133.420, 2165.465, 68.73086700Mb8.67
Possilble Candidate1911291.691, 380.090, 74.7417290156.535, 2258.445, 77.69695550Mb9.55






Source code of test script


Empty object

{}

Empty list

[]

Object of objects

List of lists

Object with only tweet id


Full tweet message

Message with string payload


Object where the field names are codes


Only fields of interest

Denormalized Tweet


Denormalized Tweet with only needed fields


Denormalized Tweet with only needed fields, keys encoded


Sunday, December 15, 2013

Replacing ARRI lights

My aim is to replace my very hot ARRI lights. I've ought on ebay a  Bi-color 1500W LED Fresnel Video Spotlight Light from  Xiamen Came Photographic Equipment Co., Ltd.(http://stores.ebay.com/PhotoLight

I came with two problems 

1) It didn't have a fresnel lens as advertised. Okay I could live with this if it wasn't for problem number two:

2) The focus adjust button didn't work. I feels as if it's poped out of it's thread.

Now I'm talking to Came and hopefully they will resolve these problems. If so it will be a great light!



Thursday, September 22, 2011

Some PiCloud tests...

I'm using their PI example. The Performance gains are great and when you see the code below you will realize that PICloud is really easy and intuitive to use. I'll be moving some of my python jobs to them.
0.0 0.0160000324249 0.047000169754 0.483999967575 4.64100003242 46.5150001049
ProcessLocationNumber of TestsNumber in ParallelWall Clock Time (sec)Pi
calcPiLocallocal11020.003.16000000
calcPiCloudcloud810230.373.04000000
calcPiLocallocal11030.003.13200000
calcPiCloudcloud81035.313.08000000
calcPiLocallocal11040.023.13640000
calcPiCloudcloud81044.313.13200000
calcPiLocallocal11050.053.13664000
calcPiCloudcloud81051.223.13840000
calcPiLocallocal11060.483.14185200
calcPiCloudcloud81062.303.14116000
calcPiLocallocal11074.643.14092240
calcPiCloudcloud81072.313.14099920
calcPiLocallocal110846.523.14138168
calcPiCloudcloud81088.503.14139276
calcPiLocallocal1109 468.45 3.14171378
calcPiCloudcloud8109 121.48 3.14159943




import cloud, random, time
cloud.setkey(your private key, 'your private key goes here')




def monteCarlo(num_test):
    """
    Throw num_test darts at a square
    Return how many appear within the quarter circle
    """
    numInCircle = 0
    for _ in xrange(num_test):
        x = random.random()
        y = random.random()
        if x*x + y*y < 1.0:  #within the quarter circle
            numInCircle += 1
    return numInCircle

def calcPiLocal(n):
    numTests = 10**n
    tick = time.time()
    numInCircle = monteCarlo(numTests)
    pi = (4 * numInCircle) / float(numTests)
    return 'calcPiLocal','local', '2', n, time.time() - tick, pi
  
def calcPiCloud(n, num_parallel = 8):
    numTests = 10**n
    tick = time.time()
    testsPerCall = numTests/num_parallel
    jids = cloud.map(monteCarlo,[testsPerCall]*num_parallel, _type='c2')
    numInCircleList = cloud.result(jids)
    numInCircle = sum(numInCircleList)
    pi = (4 * numInCircle) / float(numTests)
    return 'calcPiCloud','cloud', num_parallel, n, time.time() - tick, pi
  
  
if __name__ == '__main__':
    for n in range(2,9):
        for f in calcPiLocal, calcPiCloud:
            print f(n)

Monday, February 7, 2011

Serializion Performance

Last week  I stuck my head out  in a meeting and declared that XML is verbose and slow to parse and that we should move to something like Google's protocols buffers,  or something readable such as json or YAML, which are  easier to parse etc etc etc! Well is this really true ? The statement seems logical considering how verbose XML can be. Still, after the meeting, some questions stayed in my mind. So I thought I would do some tests. I used  a FIX Globex (CME) swap trade confirmation message to test my theory.


Sizefrom Pythonto Python
jsoncjson23320.2222380638120.0943419933319
picklecPickle17780.2335181236270.128826141357
XMLcElementTree20830.4077069759372.77832698822
jsonsimplejson23323.377236127855.11316084862






So this simple test shows that using XML with cElementTree parser  is not so slow, cjson wins in speed and the conclusion must be: Your performance will ultimately depend on your data and the quality of the libraries you have available.

I'll try  to continue with these tests and maybe find  a better YAML library.


Wednesday, December 1, 2010

My Second Super Computer

Cluster GPU Quadruple Extra Large 22 GB 
memory: 22 GB
EC2 Compute Units: 33.5 , 
GPU: 2 x NVIDIA Tesla “Fermi” M2050 GPUs, 
1690 GB of local instance storage, 64-bit platform, 10 Gigabit Ethernet
cores each: 448 
os: CENTOS 64bit

Monte Carlo on One Telsa Device

Options : 256
Simulation pathsCPUGPU
Time (ms.)options/sec.Time (ms.)options/sec.
2621446000423.58671388

Monte Carlo on Two Telsa Devices

Options : 256 split across two Tesla boards
Simulation pathsCPUGPU
Time (ms.)options/sec.Time (ms.)options/sec.
2621446000423.405151999

TOTAL Cost: $0.04

including building the environment and sample code from scratch.

CUDA Device Query (Runtime API) version (CUDART static linking)

There are 2 devices supporting CUDA

Device 0: "Tesla M2050"
  CUDA Driver Version:                           3.20
  CUDA Runtime Version:                          3.10
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 2817982464 bytes
  Multiprocessors x Cores/MP = Cores:            14 (MP) x 32 (Cores/MP) = 448 (Cores)
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Clock rate:                                    1.15 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     No
  Integrated:                                    No
  Support host page-locked memory mapping:       Yes
  Compute mode:                                  Default (multiple host threads can use this device simultaneously)
  Concurrent kernel execution:                   Yes
  Device has ECC support enabled:                Yes

Device 1: "Tesla M2050"
  CUDA Driver Version:                           3.20
  CUDA Runtime Version:                          3.10
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 2817982464 bytes
  Multiprocessors x Cores/MP = Cores:            14 (MP) x 32 (Cores/MP) = 448 (Cores)
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Clock rate:                                    1.15 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     No
  Integrated:                                    No
  Support host page-locked memory mapping:       Yes
  Compute mode:                                  Default (multiple host threads can use this device simultaneously)
  Concurrent kernel execution:                   Yes
  Device has ECC support enabled:                Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Version = 3.10, NumDevs = 2, Device = Tesla M2050, Device = Tesla M2050

Monday, October 11, 2010

zeromq tests

MacPro:~ lydia$ python client.py --messages=1000 --message-size=1000000
Connecting to server...
1000 1000000 297.66666546 MB/s 297.66666546 messages/s

MacPro:~ lydia$ python client.py --messages=10000 --message-size=100000
Connecting to hello world server...
10000 100000 200.893595587 MB/s 2008.93595587 messages/s

MacPro:~ lydia$ python client.py --messages=100000 --message-size=1000
Connecting to hello world server...
100000 10000 54.5556325827 MB/s 5455.56325827 messages/s

MacPro:~ lydia$ python client.py --messages=100000 --message-size=1000
Connecting to hello world server...
100000 1000 8.36256888389 MB/s 8362.56888389 messages/s

MacPro:~ lydia$ python client.py --messages=100000 --message-size=100
Connecting to hello world server...
100000 100 0.866791151229 MB/s 8667.91151229 messages/s

MacPro:~ lydia$ python client.py --messages=100000 --message-size=10
Connecting to hello world server...
100000 10 0.0903534916772 MB/s 9035.34916772 messages/s

MacPro:~ lydia$ python client.py --messages=100000 --message-size=1
Connecting to hello world server...
100000 1 0.00909379595588 MB/s 9093.79595588 messages/s