Just for fun, I completed a pretty basic implementation of automatically storing Python lists of integers in an SQLite database. Python’s sqlite3
module provides hooks for extending a db connection with functions to automate the conversion of Python types on the way in and out. wirebin provided the serialization and deserialization.
For a braindead performance evaluation of deserialization, I created a million row database, with a single column corresponding to the list datatype. The size of the lists stored ranged from 1 to 1000 elements, elements ranging in value from 1 to 1000. Using Python’s builtin timing functions, all rows can be retrieved, using automatic deserialization, in about 5 seconds. A very coarse test, but indications are that it takes on the order of microseconds to pull a list out of the DB. Not bad, and I expect that this would scale linearly, and very slowly, with the number of elements in the list. I need to do a simple timing of serialization and insertion, but I anticipate that should be in the same order of magnitude.
One gotcha that tripped me up is that binary data, e.g. as generated by wirebin, has to be converted into an SQLite binary blob using sqlite3.Binary
. You can’t just send any old string. And SQLite may silently drop your inserted value if you don’t do this conversion. I suspect null characters are the offending culprit. So for a while there, my list objects were going missing, yet I wasn’t getting any errors or exceptions. On the way out, you get a buffer object, which can be cast to a string and then fed to wirebin for deserialization. It’s all good.
Slouching towards prefuse in Python. Now I just have to figure out the right representations for nodes, edges, and lists of each.