I have some code that processes tweets, about 5 million a day, in realtime. They are currently stored in mongodb and also posted on various celery/rabbitmq work queues. The average message size is 5524, so encoding and decoding these messages is an issue.
Using the following test code below.
Using the following test code below.
Standard Tweet message Encode/Decode with python built-in json package.
message sizeTest | Msg Size | De-serialize | Obj Size | Serialize cjosn, bson, ujson | Storage | Storoge Cost | |
---|---|---|---|---|---|---|---|
Empty object | 41 | 14.790, 37.565, 0.970 | 54 | 6.341, 41.856, 1.249 | 2050Mb | 0.21 | {} |
Empty list | 41 | 15.069, 38.021, 1.005 | 54 | 6.675, 41.475, 1.400 | 2050Mb | 0.21 | [] |
Object of objects | 843 | 107.750, 145.440, 25.525 | 3226 | 63.555, 828.235, 28.051 | 42150Mb | 4.21 | |
List of lists | 563 | 58.805, 81.950, 16.960 | 104 | 43.426, 815.965, 18.311 | 28150Mb | 2.81 | |
Object with only tweet id | 93 | 25.030, 53.360, 2.280 | 422 | 23.570, 83.445, 3.295 | 4650Mb | 0.47 | |
Full tweet message | 4386 | 697.221, 867.780, 188.560 | 12606 | 360.290, 5847.335, 201.610 | 219300Mb | 21.93 | https://gist.github.com/thanos/adf7e20b5f00551a38a8 |
Message with string payload | 4899 | 446.661, 489.455, 82.361 | 9170 | 104.254, 327.455, 92.460 | 244950Mb | 24.50 | |
Object where the field names are codes | 4396 | 422.220, 456.184, 75.035 | 7880 | 73.345, 231.071, 81.570 | 219800Mb | 21.98 | |
Only fields of interest | 1911 | 285.840, 375.440, 74.100 | 7290 | 147.895, 2252.359, 78.495 | 95550Mb | 9.55 | https://gist.github.com/thanos/adf7e20b5f00551a38a8 |
Only fields of interest, keys encoded | 1660 | 288.585, 378.330, 67.400 | 7290 | 143.045, 2246.330, 70.940 | 83000Mb | 8.30 | |
Denormalized Tweet | 4707 | 603.030, 716.414, 186.300 | 9246 | 350.465, 5802.315, 198.150 | 235350Mb | 23.54 | |
Denormalized Tweet with only needed fields | 2158 | 255.200, 318.004, 77.040 | 6378 | 142.004, 2165.425, 71.850 | 107900Mb | 10.79 | |
Denormalized Tweet with only needed fields, keys encoded | 1734 | 242.375, 305.970, 65.689 | 6378 | 133.420, 2165.465, 68.730 | 86700Mb | 8.67 | |
Possilble Candidate | 1911 | 291.691, 380.090, 74.741 | 7290 | 156.535, 2258.445, 77.696 | 95550Mb | 9.55 |
Source code of test script
Empty object
{}
Empty list
[]
No comments:
Post a Comment