Wednesday, August 31, 2011

asynchonous execution for databases, using places

I added asynchronous execution to my database library yesterday using Racket's places. The coding part took about an afternoon and part of an evening. The new code is a bit less than 300 lines, most of which is boring serialization and deserialization code, some of which will go away soon.

My database library contains two wire-protocol connection implementations (for PostgreSQL and MySQL) and two FFI-based connection implementations (for SQLite and ODBC). The wire-protocol implementations are more work, but they just use I/O ports, and Racket handles I/O pretty well. On the other hand, the entire Racket VM stops during an FFI call, because Racket threads are green threads.

Having all threads stop execution for FFI calls isn't much of a problem if the FFI calls are all short. If the FFI call is "execute this SQL statement," on the other hand, that can cause serious problems with responsiveness. (Of course, it still depends on how long the SQL statement in question takes to execute.)

ODBC provides the ability to execute some operations asynchronously—in theory. In practice, of all the drivers I had available on my development machines, only the DB2 driver actually supported asynchronous execution. Furthermore, the way one performs an asynchronous call—repeatedly calling a function with identical arguments until it returns something different—plays poorly with GC'd languages, where keeping memory locations identical from call to call requires more effort than it does in, say, C. In short, ODBC's asynchronous execution doesn't solve the interactivity problem.

Racket actually has multiple kinds of concurrency. In addition to (green) threads, Racket also has "futures" (true concurrency if it's not too much trouble, everything shared) and "places" (true concurrency for sure, almost nothing shared, message passing). You can't send higher-order data (functions, objects, etc) between places (rather, you would have to be clever about it), but database connections traffic in mostly first-order data structures, so it's relatively easy to create a connection proxy that dispatches to a real database connection running in a difference place.

The one exception, the single kind of higher-order data used by connections, is the prepared statement object. But it's possible to proxy those using a hash table and finalizers. (Actually, prepared statements use finalizers already to clean up resources, and the clean-up code is in the connection class, so I didn't even need to create a new prepared statement class.)