Thursday, December 26, 2013

JSON Performance

I have some code that processes tweets, about 5 million a day, in realtime. They are currently stored in mongodb and also posted on various celery/rabbitmq work queues. The average message size is 5524, so encoding and decoding these messages is an issue.

Using the following test code below.




Standard Tweet message Encode/Decode with python built-in json package.

message size
TestMsg SizeDe-serializeObj SizeSerialize
cjosn, bson, ujson
StorageStoroge Cost
Empty object4114.790, 37.565, 0.970546.341, 41.856, 1.2492050Mb0.21{}
Empty list4115.069, 38.021, 1.005546.675, 41.475, 1.4002050Mb0.21[]
Object of objects843107.750, 145.440, 25.525322663.555, 828.235, 28.05142150Mb4.21
{
"id": "tag:search.twitter.com,2005:403224522679009280",
"actor": {
"preferredUsername": "LandenEhlers",
"displayName": "Landen Ehlers",
"followersCount": 15,
"twitterTimeZone": null,
"image": "https://pbs.twimg.com/profile_images/378800000646150423/83090ccb95a60def923c674e7bd002a0_normal.jpeg",
"verified": false,
"statusesCount": 24,
"summary": "Senior Construction Science student at Texas A&M University. Barefoot Waterskiing National Champion. Vice President of the Texas",
"utcOffset": null,
"link": "http://www.twitter.com/LandenEhlers",
"location": {
"displayName": "College Station, TX",
"objectType": "place"
},
"favoritesCount": 2,
"friendsCount": 65,
"listedCount": 1,
"postedTime": "2013-09-11T23:52:03.000Z",
"id": "id:twitter.com:1855784545",
"objectType": "person"
},
"objectType": "activity"
}
view raw dict_dict.json hosted with ❤ by GitHub
List of lists56358.805, 81.950, 16.96010443.426, 815.965, 18.31128150Mb2.81
[
"tag:search.twitter.com,2005:403224522679009280",
"activity",
[
"person",
"id:twitter.com:1855784545",
"http:\\/\\/www.twitter.com\\/LandenEhlers",
"Landen Ehlers",
"2013-09-11T23:52:03.000Z",
"https:\\/\\/pbs.twimg.com\\/profile_images\\/378800000646150423\\/83090ccb95a60def923c674e7bd002a0_normal.jpeg",
"Senior Construction Science student at Texas A&M University. Barefoot Waterskiing National Champion. Vice President of the Texas",
65,
15,
1,
24,
null,
false,
null,
"LandenEhlers",
[
"place",
"College Station, TX"
],
2
]
]
view raw list_list.json hosted with ❤ by GitHub
Object with only tweet id9325.030, 53.360, 2.28042223.570, 83.445, 3.2954650Mb0.47
Full tweet message4386697.221, 867.780, 188.56012606360.290, 5847.335, 201.610219300Mb21.93https://gist.github.com/thanos/adf7e20b5f00551a38a8
Message with string payload4899446.661, 489.455, 82.3619170104.254, 327.455, 92.460244950Mb24.50
Object where the field names are codes4396422.220, 456.184, 75.035788073.345, 231.071, 81.570219800Mb21.98
Only fields of interest1911285.840, 375.440, 74.1007290147.895, 2252.359, 78.49595550Mb9.55https://gist.github.com/thanos/adf7e20b5f00551a38a8
Only fields of interest, keys encoded1660288.585, 378.330, 67.4007290143.045, 2246.330, 70.94083000Mb8.30
Denormalized Tweet4707603.030, 716.414, 186.3009246350.465, 5802.315, 198.150235350Mb23.54
Denormalized Tweet with only needed fields2158255.200, 318.004, 77.0406378142.004, 2165.425, 71.850107900Mb10.79
Denormalized Tweet with only needed fields, keys encoded1734242.375, 305.970, 65.6896378133.420, 2165.465, 68.73086700Mb8.67
Possilble Candidate1911291.691, 380.090, 74.7417290156.535, 2258.445, 77.69695550Mb9.55






Source code of test script

#!/bin/env python
# json_tests.py
#
#
# Created by thanos vassilakis on 12/26/13.
#
from sys import getsizeof, stderr
import requests
import json,sys
def total_size(o):
size = getsizeof(o)
if type(o) is type({}):
size += sum([total_size(v) for v in o.values()])
elif type(o) is type({}):
size += sum([total_size(v) for v in o])
return size
def codeKeys(obj, path=''):
if type(obj) == type({}):
return dict([("%s.%X"% (path, i) if path else "%X"% i, codeKeys(v,"%s.%X"% (path, i) if path else "%X"% i)) for i,v in enumerate(obj.values())])
return obj
def pathKeys(obj, path=''):
if type(obj) == type({}):
return dict([("%s.%s"% (path, k) if path else k, pathKeys(obj[k],"%s.%s"% (path, k) if path else k)) for k in obj])
return obj
def denormalize(obj, new_obj=None):
if not new_obj:
new_obj = {}
for k in obj:
if type(obj[k]) == type({}):
new_obj.update(denormalize(obj[k], new_obj))
else:
new_obj[k] = obj[k]
return new_obj
empty_obj="{}"
empty_list="[]"
dict_dict = requests.get("https://gist.github.com/thanos/8153867/raw/dict_dict.json").json()
list_list = requests.get("https://gist.github.com/thanos/8153937/raw/list_list.json").json()
tweet = requests.get("https://gist.github.com/thanos/8153701/raw/gnip_record.json").json()
tweet_id_only= dict(id=tweet['id'])
tweet_payload =dict(id=tweet['id'], payload=json.dumps(tweet))
tweet_coded = json.dumps(codeKeys(tweet))
tweet_with_needed_fields=requests.get("https://gist.github.com/thanos/adf7e20b5f00551a38a8/raw/gnip_record_with_needed_fields.json").json()
tweet_foi_coded = codeKeys(tweet_with_needed_fields)
denormalized_tweet = denormalize(pathKeys( tweet ))
denormalized_tweet_foi = denormalize(pathKeys(tweet_with_needed_fields))
denormalized_tweet_foi_coded = denormalize(pathKeys(tweet_foi_coded))
if __name__ == '__main__':
import timeit,pprint, time
tests =[
["Empty object", empty_obj, "{}"],
["Empty list", empty_list, "[]"],
["Object of objects", dict_dict, '<script src="https://gist.github.com/thanos/8153867.js"></script>'],
["List of lists", list_list, '<script src="https://gist.github.com/thanos/8153937.js"></script>'],
["Object with only tweet id", tweet_id_only, ""],
["Full tweet message", tweet, '<a href="https://gist.github.com/thanos/adf7e20b5f00551a38a8">https://gist.github.com/thanos/adf7e20b5f00551a38a8</a>'],
["Message with string payload", tweet_payload,''],
["Object where the field names are codes", tweet_coded,''],
["Only fields of interest", tweet_with_needed_fields, '<a href="https://gist.github.com/thanos/adf7e20b5f00551a38a8">https://gist.github.com/thanos/adf7e20b5f00551a38a8</a>' ],
["Only fields of interest, keys encoded", tweet_foi_coded,''],
["Denormalized Tweet", denormalized_tweet,''],
["Denormalized Tweet with only needed fields", denormalized_tweet_foi,''],
["Denormalized Tweet with only needed fields, keys encoded", denormalized_tweet_foi_coded,''],
]
for title, test, example in tests:
test_msg = json.dumps(test)
msg_size = sys.getsizeof(test_msg)
tick = time.time()
for i in xrange(100000):
json.loads(test_msg)
deserialize = time.time() - tick
test_obj = json.loads(test_msg)
obj_size = total_size(test_obj)
#pprint.pprint(test_obj)
tick = time.time()
for i in xrange(100000):
json.dumps(test_obj)
serialize = time.time() - tick
storage = msg_size * 5000000
storage_cost = storage * 1e-10
print '<tr bgcolor="white"><td>%s</td><td>%d</td><td>%.3f</td><td>%d</td><td>%.3f</td><td>%dMb</td><td>%.2f</td><td><tt>%s</tt></td></tr>' % (title, msg_size, deserialize, obj_size, serialize, storage/(10**6), storage_cost, example)
view raw json_tests.py hosted with ❤ by GitHub

Empty object

{}

Empty list

[]

Object of objects

{
"id": "tag:search.twitter.com,2005:403224522679009280",
"actor": {
"preferredUsername": "LandenEhlers",
"displayName": "Landen Ehlers",
"followersCount": 15,
"twitterTimeZone": null,
"image": "https://pbs.twimg.com/profile_images/378800000646150423/83090ccb95a60def923c674e7bd002a0_normal.jpeg",
"verified": false,
"statusesCount": 24,
"summary": "Senior Construction Science student at Texas A&M University. Barefoot Waterskiing National Champion. Vice President of the Texas",
"utcOffset": null,
"link": "http://www.twitter.com/LandenEhlers",
"location": {
"displayName": "College Station, TX",
"objectType": "place"
},
"favoritesCount": 2,
"friendsCount": 65,
"listedCount": 1,
"postedTime": "2013-09-11T23:52:03.000Z",
"id": "id:twitter.com:1855784545",
"objectType": "person"
},
"objectType": "activity"
}
view raw dict_dict.json hosted with ❤ by GitHub

List of lists

[
"tag:search.twitter.com,2005:403224522679009280",
"activity",
[
"person",
"id:twitter.com:1855784545",
"http:\\/\\/www.twitter.com\\/LandenEhlers",
"Landen Ehlers",
"2013-09-11T23:52:03.000Z",
"https:\\/\\/pbs.twimg.com\\/profile_images\\/378800000646150423\\/83090ccb95a60def923c674e7bd002a0_normal.jpeg",
"Senior Construction Science student at Texas A&M University. Barefoot Waterskiing National Champion. Vice President of the Texas",
65,
15,
1,
24,
null,
false,
null,
"LandenEhlers",
[
"place",
"College Station, TX"
],
2
]
]
view raw list_list.json hosted with ❤ by GitHub

Object with only tweet id


Full tweet message

{
"body": "Enjoyed our half price chicken and wawfuls today! @tamusportclubs @SullysGrill @TAMUWaterski #SCPartnerday http://t.co/XRsVqYy9Zo",
"retweetCount": 0,
"generator": {
"link": "http://twitter.com/download/android",
"displayName": "Twitter for Android"
},
"twitter_filter_level": "medium",
"geo": {
"type": "Point",
"coordinates": [
30.622496,
-96.3283527
]
},
"favoritesCount": 0,
"object": {
"postedTime": "2013-11-20T18:13:12.000Z",
"summary": "Enjoyed our half price chicken and wawfuls today! @tamusportclubs @SullysGrill @TAMUWaterski #SCPartnerday http://t.co/XRsVqYy9Zo",
"link": "http://twitter.com/LandenEhlers/statuses/403224522679009280",
"id": "object:search.twitter.com,2005:403224522679009280",
"objectType": "note"
},
"actor": {
"preferredUsername": "LandenEhlers",
"displayName": "Landen Ehlers",
"links": [
{
"href": null,
"rel": "me"
}
],
"twitterTimeZone": null,
"image": "https://pbs.twimg.com/profile_images/378800000646150423/83090ccb95a60def923c674e7bd002a0_normal.jpeg",
"verified": false,
"location": {
"displayName": "College Station, TX",
"objectType": "place"
},
"statusesCount": 24,
"summary": "Senior Construction Science student at Texas A&M University. Barefoot Waterskiing National Champion. Vice President of the Texas A&M Waterski Team.",
"languages": [
"en"
],
"utcOffset": null,
"link": "http://www.twitter.com/LandenEhlers",
"followersCount": 15,
"favoritesCount": 2,
"friendsCount": 65,
"listedCount": 1,
"postedTime": "2013-09-11T23:52:03.000Z",
"id": "id:twitter.com:1855784545",
"objectType": "person"
},
"twitter_lang": "en",
"twitter_entities": {
"symbols": [],
"user_mentions": [
{
"id": 338528272,
"indices": [
50,
65
],
"id_str": "338528272",
"screen_name": "tamusportclubs",
"name": "TAMU Sport Clubs"
},
{
"id": 325152462,
"indices": [
66,
78
],
"id_str": "325152462",
"screen_name": "SullysGrill",
"name": "Sully's Sports Grill"
},
{
"id": 432729855,
"indices": [
79,
92
],
"id_str": "432729855",
"screen_name": "TAMUWaterski",
"name": "TAMU Waterski"
}
],
"hashtags": [
{
"indices": [
93,
106
],
"text": "SCPartnerday"
}
],
"urls": [],
"media": [
{
"expanded_url": "http://twitter.com/LandenEhlers/status/403224522679009280/photo/1",
"display_url": "pic.twitter.com/XRsVqYy9Zo",
"url": "http://t.co/XRsVqYy9Zo",
"media_url_https": "https://pbs.twimg.com/media/BZiKkRdCEAAFC-j.jpg",
"id_str": "403224522414755840",
"sizes": {
"small": {
"h": 192,
"resize": "fit",
"w": 340
},
"large": {
"h": 579,
"resize": "fit",
"w": 1023
},
"medium": {
"h": 339,
"resize": "fit",
"w": 600
},
"thumb": {
"h": 150,
"resize": "crop",
"w": 150
}
},
"indices": [
107,
129
],
"type": "photo",
"id": 4.0322452241476e+17,
"media_url": "http://pbs.twimg.com/media/BZiKkRdCEAAFC-j.jpg"
}
]
},
"verb": "post",
"link": "http://twitter.com/LandenEhlers/statuses/403224522679009280",
"location": {
"displayName": "College Station, TX",
"name": "College Station",
"link": "https://api.twitter.com/1.1/geo/id/85128f80a57c03ad.json",
"twitter_country_code": "US",
"country_code": "United States",
"geo": {
"type": "Polygon",
"coordinates": [
[
[
-96.386719,
30.534473
],
[
-96.386719,
30.658246
],
[
-96.204688,
30.658246
],
[
-96.204688,
30.534473
]
]
]
},
"objectType": "place"
},
"provider": {
"link": "http://www.twitter.com",
"displayName": "Twitter",
"objectType": "service"
},
"postedTime": "2013-11-20T18:13:12.000Z",
"id": "tag:search.twitter.com,2005:403224522679009280",
"gnip": {
"matching_rules": [
{
"tag": null,
"value": "has:geo has:mentions has:links has:hashtags has:profile_geo"
}
],
"profileLocations": [
{
"displayName": "College Station, Texas, United States",
"address": {
"country": "United States",
"region": "Texas",
"subRegion": "Brazos County",
"countryCode": "US",
"locality": "College Station"
},
"geo": {
"type": "point",
"coordinates": [
-96.33441,
30.62798
]
},
"objectType": "place"
}
],
"language": {
"value": "en"
},
"klout_score": 21,
"urls": [
{
"url": "http://t.co/XRsVqYy9Zo",
"expanded_status": 200,
"expanded_url": "http://twitter.com/LandenEhlers/status/403224522679009280/photo/1"
}
],
"klout_profile": {
"link": "http://klout.com/user/id/289637767579957066",
"topics": [
{
"link": "http://klout.com/topic/id/10000000000000010000",
"displayName": "Ricky Carmichael",
"klout_topic_id": "10000000000000010000"
},
{
"link": "http://klout.com/topic/id/1297",
"displayName": "Rock Music",
"klout_topic_id": "1297"
}
],
"klout_user_id": "289637767579957066"
}
},
"objectType": "activity"
}

Message with string payload


Object where the field names are codes


Only fields of interest

{
"body": "Enjoyed our half price chicken and wawfuls today! @tamusportclubs @SullysGrill @TAMUWaterski #SCPartnerday http://t.co/XRsVqYy9Zo",
"retweetCount": 0,
"generator": "Twitter for Android",
"geo": [
30.622496,
-96.3283527
],
"favoritesCount": 0,
"actor": {
"preferredUsername": "LandenEhlers",
"friendsCount": 65,
"followersCount": 15,
"image": "378800000646150423/83090ccb95a60def923c674e7bd002a0_normal.jpeg",
"verified": false,
"statusesCount": 24,
"summary": "Senior Construction Science student at Texas A&M University. Barefoot Waterskiing National Champion. Vice President of the Texas A&M Waterski Team.",
"languages": [
"en"
],
"location": {
"displayName": "College Station, TX"
},
"favoritesCount": 2,
"displayName": "Landen Ehlers",
"listedCount": 1,
"id": "1855784545"
},
"twitter_lang": "en",
"twitter_entities": {
"user_mentions": [
{
"id": 338528272,
"screen_name": "tamusportclubs",
"name": "TAMU Sport Clubs"
},
{
"id": 325152462,
"screen_name": "SullysGrill",
"name": "Sully's Sports Grill"
},
{
"id": 432729855,
"screen_name": "TAMUWaterski",
"name": "TAMU Waterski"
}
],
"hashtags": [
"SCPartnerday"
]
},
"provider": "Twitter",
"postedTime": "2013-11-20T18:13:12.000Z",
"id": "403224522679009280",
"gnip": {
"klout_profile": {
"topics": [
{
"displayName": "Ricky Carmichael",
"klout_topic_id": "10000000000000010000"
},
{
"displayName": "Rock Music",
"klout_topic_id": "1297"
}
],
"klout_user_id": "289637767579957066",
"address": {
"displayName": "College Station, Texas, United States",
"countryCode": "US",
"locality": "College Station",
"country": "United States",
"region": "Texas",
"subRegion": "Brazos County"
}
},
"klout_score": 21,
"matching_rules": [
{
"tag": null,
"value": "has:geo has:mentions has:links has:hashtags has:profile_geo"
}
],
"urls": [
{
"url": "http://t.co/XRsVqYy9Zo",
"expanded_url": "http://twitter.com/LandenEhlers/status/403224522679009280/photo/1"
}
]
}
}

Denormalized Tweet


Denormalized Tweet with only needed fields


Denormalized Tweet with only needed fields, keys encoded


Sunday, December 15, 2013

Replacing ARRI lights

My aim is to replace my very hot ARRI lights. I've ought on ebay a  Bi-color 1500W LED Fresnel Video Spotlight Light from  Xiamen Came Photographic Equipment Co., Ltd.(http://stores.ebay.com/PhotoLight

I came with two problems 

1) It didn't have a fresnel lens as advertised. Okay I could live with this if it wasn't for problem number two:

2) The focus adjust button didn't work. I feels as if it's poped out of it's thread.

Now I'm talking to Came and hopefully they will resolve these problems. If so it will be a great light!