I have some code that processes tweets, about 5 million a day, in realtime. They are currently stored in mongodb and also posted on various celery/rabbitmq work queues. The average message size is 5524, so encoding and decoding these messages is an issue. Using the following test code below. Standard Tweet message Encode/Decode with python built-in json package. message size Test Msg Size De-serialize Obj Size Serialize cjosn, bson, ujson Storage Storoge Cost Empty object 41 14.790, 37.565, 0.970 54 6.341, 41.856, 1.249 2050Mb 0.21 {} Empty list 41 15.069, 38.021, 1.005 54 6.675, 41.475, 1.400 2050Mb 0.21 [] Object of objects 843 107.750, 145.440, 25.525 3226 63.555, 828.235, 28.051 42150Mb 4.21 List of lists 563 58.805, 81.950, 16.960 104 43.426, 815.965, 18.311 28150Mb 2.81 Object with only tweet id 93 25.030, 53.360, 2.280 422 23.570, 83.445, 3.295 4650Mb 0.47 Full tweet message 4386 697.221, 867.780, 188.560 12606 360.290, 5847.335, 201.610 219300Mb 21...
Comments
Post a Comment