Experiments in pickling
pickle
is a standard library module to serialize and deserialize Python objects. Being written in pure Python, it's fairly slow, so the standard library provides a pure-C implementation, called cPickle
, with the limitation that it cannot be subclassed.
What's interesting about cPickle
are two little known settings that can speed up serialization quite substantially.
cPickle.HIGHEST_PROTOCOL
basically dumps a Python object using a binary protocol, rather than the default ASCII-based, more portable, protocol 0. If portability or backward compatibility are not an issue for you, you should use it: it's documented and probably here to stay.Pickler
's undocumentedfast
flag. It turns out thatcPickle
implementation has a "fast mode" that is enabled by setting thisfast
flag toTrue
. It's undocumented so probably subject to change, but it makes dumping objects way faster.
Looking at the implementation of cPickle
, the comments in the code have something to say about "fast mode":
The fast mode disable the usage of memo, therefore speeding the pickling process by not generating superfluous PUT opcodes. It should not be used if with self-referential objects.
memo is basically a cache, within the pickler, that remembers what objects have already been processed, used mainly to avoid infinite loops when dumping self-referential data structures. But if you are dumping data structures that do not reference themselves, you can spare some time disabling this caching.
To test these settings, I compared "vanilla" cPickle.dumps
, with "highest\protocol" and "fast mode", for 3 different objects: a small, a medium and a large list of dictionaries. The code is available in gist
:
Clone it and run it:
$> git clone https://gist.github.com/1bec1b70ef9c8e254b57.git pickle_experiments $> cd pickle_experiments $> sh run.sh
On my machine, I get these results:
SMALL dumps 10 loops, best of 3: 7.39 usec per loop highp 10 loops, best of 3: 2.5 usec per loop pickl 10 loops, best of 3: 4.1 usec per loop MEDIUM dumps 10 loops, best of 3: 705 usec per loop highp 10 loops, best of 3: 206 usec per loop pickl 10 loops, best of 3: 111 usec per loop LARGE dumps 3 loops, best of 3: 1.34 sec per loop highp 10 loops, best of 3: 823 msec per loop pickl 10 loops, best of 3: 135 msec per loop
Note: I've reduced the number of iterations to 3 for "vanilla LARGE" because it was taking too long…
"high\protocol" is between 2 and 3 times faster than "vanilla" for all objects, and "fast mode" is 6 times faster than "high\protocol" and 10 times faster than "vanilla" for large objects!