ByteEasy

Sunday, July 10, 2016

Setting up Python...

Setting up Python.

Install pyenv

1. Download

curl -L https://raw.github.com/yyuu/pyenv-installer/master/bin/pyenv-installer | bash

2. Setup environment variables and initialiaze in your profile.

$ echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bash_profile
$ echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bash_profile
$ echo 'eval "$(pyenv init -)"' >> ~/.bash_profile

3. Restart your shell.

$ exec $SHELL

4. Install your choice of python

pyenv install 3.5.2

5. Set as the current (global) version of python

pyenv global 3.5.2

6. Install the virtual environment plugin

pyenv-virtualenv

7. create a virtual environment

pyenv virtualenv protovima

8. Activate it.

pyenv virtualenv protovima


NOTE With python 3.5 you should install:

sudo apt-get install curl git-core gcc make zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev libssl-dev

Thursday, December 26, 2013

I have some code that processes tweets, about 5 million a day, in realtime. They are currently stored in mongodb and also posted on various celery/rabbitmq work queues. The average message size is 5524, so encoding and decoding these messages is an issue.

Using the following test code below.

Standard Tweet message Encode/Decode with python built-in json package.

message size

Test	Msg Size	De-serialize	Obj Size	Serialize cjosn, bson, ujson	Storage	Storoge Cost
Empty object	41	14.790, 37.565, 0.970	54	6.341, 41.856, 1.249	2050Mb	0.21	`{}`
Empty list	41	15.069, 38.021, 1.005	54	6.675, 41.475, 1.400	2050Mb	0.21	`[]`
Object of objects	843	107.750, 145.440, 25.525	3226	63.555, 828.235, 28.051	42150Mb	4.21
List of lists	563	58.805, 81.950, 16.960	104	43.426, 815.965, 18.311	28150Mb	2.81
Object with only tweet id	93	25.030, 53.360, 2.280	422	23.570, 83.445, 3.295	4650Mb	0.47
Full tweet message	4386	697.221, 867.780, 188.560	12606	360.290, 5847.335, 201.610	219300Mb	21.93	`https://gist.github.com/thanos/adf7e20b5f00551a38a8`
Message with string payload	4899	446.661, 489.455, 82.361	9170	104.254, 327.455, 92.460	244950Mb	24.50
Object where the field names are codes	4396	422.220, 456.184, 75.035	7880	73.345, 231.071, 81.570	219800Mb	21.98
Only fields of interest	1911	285.840, 375.440, 74.100	7290	147.895, 2252.359, 78.495	95550Mb	9.55	`https://gist.github.com/thanos/adf7e20b5f00551a38a8`
Only fields of interest, keys encoded	1660	288.585, 378.330, 67.400	7290	143.045, 2246.330, 70.940	83000Mb	8.30
Denormalized Tweet	4707	603.030, 716.414, 186.300	9246	350.465, 5802.315, 198.150	235350Mb	23.54
Denormalized Tweet with only needed fields	2158	255.200, 318.004, 77.040	6378	142.004, 2165.425, 71.850	107900Mb	10.79
Denormalized Tweet with only needed fields, keys encoded	1734	242.375, 305.970, 65.689	6378	133.420, 2165.465, 68.730	86700Mb	8.67
Possilble Candidate	1911	291.691, 380.090, 74.741	7290	156.535, 2258.445, 77.696	95550Mb	9.55

Source code of test script

Empty object

{}

Empty list

[]

Object of objects

List of lists

Object with only tweet id

Full tweet message

Message with string payload

Object where the field names are codes

Only fields of interest

Denormalized Tweet

Denormalized Tweet with only needed fields

Denormalized Tweet with only needed fields, keys encoded

Sunday, December 15, 2013

Replacing ARRI lights

My aim is to replace my very hot ARRI lights. I've ought on ebay a Bi-color 1500W LED Fresnel Video Spotlight Light from Xiamen Came Photographic Equipment Co., Ltd.(http://stores.ebay.com/PhotoLight)

I came with two problems

1) It didn't have a fresnel lens as advertised. Okay I could live with this if it wasn't for problem number two:

2) The focus adjust button didn't work. I feels as if it's poped out of it's thread.

Now I'm talking to Came and hopefully they will resolve these problems. If so it will be a great light!

Thursday, September 22, 2011

Some PiCloud tests...

I'm using their PI example. The Performance gains are great and when you see the code below you will realize that PICloud is really easy and intuitive to use. I'll be moving some of my python jobs to them.
0.0 0.0160000324249 0.047000169754 0.483999967575 4.64100003242 46.5150001049

Process	Location	Number of Tests	Number in Parallel	Wall Clock Time (sec)	Pi
calcPiLocal	local	1	10²	0.00	3.16000000
calcPiCloud	cloud	8	10²	30.37	3.04000000
calcPiLocal	local	1	10³	0.00	3.13200000
calcPiCloud	cloud	8	10³	5.31	3.08000000
calcPiLocal	local	1	10⁴	0.02	3.13640000
calcPiCloud	cloud	8	10⁴	4.31	3.13200000
calcPiLocal	local	1	10⁵	0.05	3.13664000
calcPiCloud	cloud	8	10⁵	1.22	3.13840000
calcPiLocal	local	1	10⁶	0.48	3.14185200
calcPiCloud	cloud	8	10⁶	2.30	3.14116000
calcPiLocal	local	1	10⁷	4.64	3.14092240
calcPiCloud	cloud	8	10⁷	2.31	3.14099920
calcPiLocal	local	1	10⁸	46.52	3.14138168
calcPiCloud	cloud	8	10⁸	8.50	3.14139276
calcPiLocal	local	1	10⁹	468.45	3.14171378
calcPiCloud	cloud	8	10⁹	121.48	3.14159943

import cloud, random, time
cloud.setkey(your private key, 'your private key goes here')




def monteCarlo(num_test):
    """
    Throw num_test darts at a square
    Return how many appear within the quarter circle
    """
    numInCircle = 0
    for _ in xrange(num_test):
        x = random.random()
        y = random.random()
        if x*x + y*y < 1.0:  #within the quarter circle
            numInCircle += 1
    return numInCircle

def calcPiLocal(n):
    numTests = 10**n
    tick = time.time()
    numInCircle = monteCarlo(numTests)
    pi = (4 * numInCircle) / float(numTests)
    return 'calcPiLocal','local', '2', n, time.time() - tick, pi
  
def calcPiCloud(n, num_parallel = 8):
    numTests = 10**n
    tick = time.time()
    testsPerCall = numTests/num_parallel
    jids = cloud.map(monteCarlo,[testsPerCall]*num_parallel, _type='c2')
    numInCircleList = cloud.result(jids)
    numInCircle = sum(numInCircleList)
    pi = (4 * numInCircle) / float(numTests)
    return 'calcPiCloud','cloud', num_parallel, n, time.time() - tick, pi
  
  
if __name__ == '__main__':
    for n in range(2,9):
        for f in calcPiLocal, calcPiCloud:
            print f(n)

Monday, February 7, 2011

Serializion Performance

Last week I stuck my head out in a meeting and declared that XML is verbose and slow to parse and that we should move to something like Google's protocols buffers, or something readable such as json or YAML, which are easier to parse etc etc etc! Well is this really true ? The statement seems logical considering how verbose XML can be. Still, after the meeting, some questions stayed in my mind. So I thought I would do some tests. I used a FIX Globex (CME) swap trade confirmation message to test my theory.

		Size	from Python	to Python
json	cjson	2332	0.222238063812	0.0943419933319
pickle	cPickle	1778	0.233518123627	0.128826141357
XML	cElementTree	2083	0.407706975937	2.77832698822
json	simplejson	2332	3.37723612785	5.11316084862

So this simple test shows that using XML with cElementTree parser is not so slow, cjson wins in speed and the conclusion must be: Your performance will ultimately depend on your data and the quality of the libraries you have available.

I'll try to continue with these tests and maybe find a better YAML library.

Wednesday, December 1, 2010

My Second Super Computer

Cluster GPU Quadruple Extra Large 22 GB 
memory: 22 GB
EC2 Compute Units: 33.5 , 
GPU: 2 x NVIDIA Tesla “Fermi” M2050 GPUs, 
1690 GB of local instance storage, 64-bit platform, 10 Gigabit Ethernet

cores each: 448 
os: CENTOS 64bit


Monte Carlo on One Telsa Device





Options : 256

Simulation paths CPU GPU
Time (ms.) options/sec. Time (ms.) options/sec.
262144 6000 42 3.586 71388








Monte Carlo on Two Telsa Devices





Options : 256 split across two Tesla boards

Simulation paths CPU GPU
Time (ms.) options/sec. Time (ms.) options/sec.
262144 6000 42 3.405 151999








TOTAL Cost: $0.04

including building the environment and sample code from scratch.

Simulation paths	CPU	GPU
Time (ms.)	options/sec.	Time (ms.)	options/sec.
262144	6000	42	3.586	71388

Simulation paths	CPU	GPU
Time (ms.)	options/sec.	Time (ms.)	options/sec.
262144	6000	42	3.405	151999

CUDA Device Query (Runtime API) version (CUDART static linking)

There are 2 devices supporting CUDA

Device 0: "Tesla M2050"
  CUDA Driver Version:                           3.20
  CUDA Runtime Version:                          3.10
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 2817982464 bytes
  Multiprocessors x Cores/MP = Cores:            14 (MP) x 32 (Cores/MP) = 448 (Cores)
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Clock rate:                                    1.15 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     No
  Integrated:                                    No
  Support host page-locked memory mapping:       Yes
  Compute mode:                                  Default (multiple host threads can use this device simultaneously)
  Concurrent kernel execution:                   Yes
  Device has ECC support enabled:                Yes

Device 1: "Tesla M2050"
  CUDA Driver Version:                           3.20
  CUDA Runtime Version:                          3.10
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 2817982464 bytes
  Multiprocessors x Cores/MP = Cores:            14 (MP) x 32 (Cores/MP) = 448 (Cores)
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Clock rate:                                    1.15 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     No
  Integrated:                                    No
  Support host page-locked memory mapping:       Yes
  Compute mode:                                  Default (multiple host threads can use this device simultaneously)
  Concurrent kernel execution:                   Yes
  Device has ECC support enabled:                Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Version = 3.10, NumDevs = 2, Device = Tesla M2050, Device = Tesla M2050

Monday, October 11, 2010

zeromq tests

MacPro:~ lydia$ python client.py --messages=1000 --message-size=1000000
Connecting to server...
1000 1000000 297.66666546 MB/s 297.66666546 messages/s

MacPro:~ lydia$ python client.py --messages=10000 --message-size=100000
Connecting to hello world server...
10000 100000 200.893595587 MB/s 2008.93595587 messages/s

MacPro:~ lydia$ python client.py --messages=100000 --message-size=1000
Connecting to hello world server...
100000 10000 54.5556325827 MB/s 5455.56325827 messages/s

MacPro:~ lydia$ python client.py --messages=100000 --message-size=1000
Connecting to hello world server...
100000 1000 8.36256888389 MB/s 8362.56888389 messages/s

MacPro:~ lydia$ python client.py --messages=100000 --message-size=100
Connecting to hello world server...
100000 100 0.866791151229 MB/s 8667.91151229 messages/s

MacPro:~ lydia$ python client.py --messages=100000 --message-size=10
Connecting to hello world server...
100000 10 0.0903534916772 MB/s 9035.34916772 messages/s

MacPro:~ lydia$ python client.py --messages=100000 --message-size=1
Connecting to hello world server...
100000 1 0.00909379595588 MB/s 9093.79595588 messages/s