ByteEasy: 2010

Wednesday, December 1, 2010

My Second Super Computer

Cluster GPU Quadruple Extra Large 22 GB 
memory: 22 GB
EC2 Compute Units: 33.5 , 
GPU: 2 x NVIDIA Tesla “Fermi” M2050 GPUs, 
1690 GB of local instance storage, 64-bit platform, 10 Gigabit Ethernet

cores each: 448 
os: CENTOS 64bit


Monte Carlo on One Telsa Device





Options : 256

Simulation paths CPU GPU
Time (ms.) options/sec. Time (ms.) options/sec.
262144 6000 42 3.586 71388








Monte Carlo on Two Telsa Devices





Options : 256 split across two Tesla boards

Simulation paths CPU GPU
Time (ms.) options/sec. Time (ms.) options/sec.
262144 6000 42 3.405 151999








TOTAL Cost: $0.04

including building the environment and sample code from scratch.

Simulation paths	CPU	GPU
Time (ms.)	options/sec.	Time (ms.)	options/sec.
262144	6000	42	3.586	71388

Simulation paths	CPU	GPU
Time (ms.)	options/sec.	Time (ms.)	options/sec.
262144	6000	42	3.405	151999

CUDA Device Query (Runtime API) version (CUDART static linking)

There are 2 devices supporting CUDA

Device 0: "Tesla M2050"
  CUDA Driver Version:                           3.20
  CUDA Runtime Version:                          3.10
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 2817982464 bytes
  Multiprocessors x Cores/MP = Cores:            14 (MP) x 32 (Cores/MP) = 448 (Cores)
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Clock rate:                                    1.15 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     No
  Integrated:                                    No
  Support host page-locked memory mapping:       Yes
  Compute mode:                                  Default (multiple host threads can use this device simultaneously)
  Concurrent kernel execution:                   Yes
  Device has ECC support enabled:                Yes

Device 1: "Tesla M2050"
  CUDA Driver Version:                           3.20
  CUDA Runtime Version:                          3.10
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 2817982464 bytes
  Multiprocessors x Cores/MP = Cores:            14 (MP) x 32 (Cores/MP) = 448 (Cores)
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Clock rate:                                    1.15 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     No
  Integrated:                                    No
  Support host page-locked memory mapping:       Yes
  Compute mode:                                  Default (multiple host threads can use this device simultaneously)
  Concurrent kernel execution:                   Yes
  Device has ECC support enabled:                Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Version = 3.10, NumDevs = 2, Device = Tesla M2050, Device = Tesla M2050

Monday, October 11, 2010

MacPro:~ lydia$ python client.py --messages=1000 --message-size=1000000
Connecting to server...
1000 1000000 297.66666546 MB/s 297.66666546 messages/s

MacPro:~ lydia$ python client.py --messages=10000 --message-size=100000
Connecting to hello world server...
10000 100000 200.893595587 MB/s 2008.93595587 messages/s

MacPro:~ lydia$ python client.py --messages=100000 --message-size=1000
Connecting to hello world server...
100000 10000 54.5556325827 MB/s 5455.56325827 messages/s

MacPro:~ lydia$ python client.py --messages=100000 --message-size=1000
Connecting to hello world server...
100000 1000 8.36256888389 MB/s 8362.56888389 messages/s

MacPro:~ lydia$ python client.py --messages=100000 --message-size=100
Connecting to hello world server...
100000 100 0.866791151229 MB/s 8667.91151229 messages/s

MacPro:~ lydia$ python client.py --messages=100000 --message-size=10
Connecting to hello world server...
100000 10 0.0903534916772 MB/s 9035.34916772 messages/s

MacPro:~ lydia$ python client.py --messages=100000 --message-size=1
Connecting to hello world server...
100000 1 0.00909379595588 MB/s 9093.79595588 messages/s

Monday, October 4, 2010

File System speeds

MacPro

2x3Ghz Quad-Core Intel Xeon
16GB 667 Mhz DDR2
  Drive: ST31500341AS
  Capacity: 1.5 TB (1,500,301,910,016 bytes)
  Model: ST31500341AS                            
  Revision: SD17    
  Serial Number:             9VS0A3HN
  Native Command Queuing: Yes
  Queue Depth: 32
  Removable Media: No
  Detachable Drive: No
  BSD Name: disk0
  Rotational Rate: 7200
  Medium Type: Rotational
  Bay Name: Bay 1
  Partition Map Type: GPT (GUID Partition Table)
  S.M.A.R.T. status: Verified
  Volumes:
  File System: Journaled HFS+
  BSD Name: disk0s2

WRITING 12.4665911198 1024000000 82.1395351915 MB/s

On first run.
READING 1.72446203232 1024000000 593.808376647 MB/s
READING 1.66705989838 1024000000 614.255073256 MB/s
READING 1.66696095467 1024000000 614.291532824 MB/s

Lenovo Think Center

Intel Core i5 650 @3.2 Ghz
3.19 GHz, 2GB Ram
Hitachi HDS721025CLA382
Windows XP (32 bit)

1st Time READING: 58.547000, 2095736020, 35.7957 MB/s
2nd Time READING: 1.516000, 2095736020, 1382.4115 MB/s 
obviously the buffer kicked in.

Now the average for 10 threads: 37.6282 MB/s

Friday, October 1, 2010

My First Super Computer

Macbook Air
1.86 Ghz Intel Core 2 Duo
2 GB 1067 Mhz DDR3
GeForce 9400M

Total amount of global memory: 265945088 bytes
Number of multiprocessors: 2
Number of cores: 16

Monte Carlo

Options : 256

Simulation paths	CPU		GPU
Simulation paths	Time (ms.)	options/sec.	Time (ms.)	options/sec.
262144	8000	32.6	245.8979	1041.08
131072	4000	64	127.68	2005
65536	2000	128	63.12	4055.57

I was thinking of building a big GPU box does anyone have any ideas ?

I'm thinking of getting:

EVGA Classified SR-2 (Super Record 2) 270-WS-W555-A1 LGA 1366 Intel 5520 SATA 6Gb/s USB 3.0 HPTX Intel Motherboard.

Adding 48 Gig and then plugging in 4 GeForce GTX 480 ??

Tuesday, September 28, 2010

Couchdb Performance on a MacPro

Couchdb 0.11.0
2x3 QUAD-Core Intel Xeon
16 GB 667Mhz DDR2
OS X 10.6.4

Inserting

NUM	BLOCK	time	Bytes	Mb/s	records/s
1	1	0.0030	1	0.000	330
10	1	0.0251	100	0.004	398
1000	1	3.2415	1000000	0.308	308
10000	1	34.6263	100000000	2.888	289
1000	10	3.4122	10610000	3.109	2931
100	100	1.5112	10601000	7.015	6617
10	1000	1.9464	10600100	5.446	5138
2	5000	2.3308	10600020	4.548	4290
1	10000	2.0068	10600010	5.282	4983
10	10000	15.7176	106000100	6.744	6362

Average Top:

33423  beam.smp     48.9      05:42.89 13    0    62   151-  39M-   264K  
33421  CouchDBX     12.9      00:52.85 6/1   3    124- 322   77M-   29M

Monday, September 13, 2010

Couchdb Performance

Using:

Intel 2 Duo T9400 @ 2.53 GHz, 2.99 GB Ram HP Elitebook
Hitachi HTS723216L9A360
Couchdb 1.0 Local Host

Simple ASCII 1K Payload:
{
"_id": "00081363",
"_rev": "1-1c29ecbf7bc15e7f9226a45594a0605d",
"payload": "X"* 1024
}

Inserting

# of writes	Block size	time	bytes	Mb/s	Records/s
1	1	0.0050	1	0.000	200
10	1	0.0470	100	0.002	213
1000	1	4.5253	1000000	0.221	221
10000	1	48.3623	100000000	2.068	207
1000	10	4.6867	10610000	2.264	2134
100	100	2.1353	10601000	4.965	4683
10	1000	2.4527	10600100	4.322	4077
2	5000	2.2917	10600020	4.625	4364
1	10000	3.4477	10600010	3.075	2901

Extracting ALL with a js view

# of reads	time	bytes	Mb/s	Records/s
138485	25.7300	157733320	6.130	5382

Extracting with a js view

# of reads	Block size	time	bytes	Mb/s	Records/s
1	1	0.0000	1189	1189000.000	1000000000
10	1	0.0470	11890	0.253	213
100	1	0.4220	118990	0.282	237
1000	1	4.4990	1190890	0.265	222
1000	10	6.0930	11494000	1.886	1641
100	100	2.4210	11454400	4.731	4131
10	1000	1.9690	11450440	5.815	5079
2	5000	1.9370	11450088	5.911	5163
1	10000	1.9370	11450044	5.911	5163

Extracting ALL with a python

NUM	time	bytes	Mb/s
138489	25.6690	157733320	6.145	5395

Extracting with a python view

NUM	BLOCK	time	bytes	Mb/s
1000	10	5.6090	11494000	2.049	1783
100	100	2.3280	11454400	4.920	4296
10	1000	1.9370	11450440	5.911	5163
2	5000	1.8900	11450088	6.058	5291
1	10000	1.9220	11450044	5.957	5203
1	1	0.0000	1189	1189000.000	1000000000
10	1	0.0470	11890	0.253	213
100	1	0.4060	118990	0.293	246
1000	1	4.5310	1190890	0.263	221

Extracting ALL

# of reads	time	bytes	Mb/s	Records/s
138485	6.4990	12463712	1.918	21309

Extracting

# of reads	Block size	time	bytes	Mb/s	Records/s
1	1	0.0160	1100	0.069	62
10	1	0.0470	11000	0.234	213
100	1	0.4370	110000	0.252	229
1000	10	9.6080	12004000	1.249	1041
100	100	6.9990	11964400	1.709	1429
10	1000	6.4980	11960440	1.841	1539
2	5000	6.5620	11960088	1.823	1524
1	10000	6.7330	11960044	1.776	1485

Wednesday, December 1, 2010

My Second Super Computer

Monte Carlo on One Telsa Device

Monte Carlo on Two Telsa Devices

TOTAL Cost: $0.04

Monday, October 11, 2010

zeromq tests

Monday, October 4, 2010

File System speeds

MacPro

Lenovo Think Center

Friday, October 1, 2010

My First Super Computer

Monte Carlo

Tuesday, September 28, 2010

Couchdb Performance on a MacPro

Inserting

Monday, September 13, 2010

Couchdb Performance

Inserting

Extracting ALL with a js view

Extracting with a js view

Extracting ALL with a python

Extracting with a python view

Extracting ALL