Thursday, February 09, 2012

unprepared queries vs statement caching

Racket’s db library supports parameterized queries where the SQL is given as a string (that is, unprepared):

(query-list c "SELECT field1 FROM table WHERE field2 = ?" 17)

This is handled by an implicit call to prepare, which turns the string into a prepared statement; the prepared statement is then executed with the arguments.

The problem with this approach is that it involves two trips to the server: one to prepare and one to execute. One trip would be better. In this post I discuss two techniques for eliminating extraneous server trips. I’ve recently added one of them to Racket; the other turns out to be problematic.

The first approach, which I call “unprepared query,” is to avoid preparing the statement altogether. PostgreSQL has a way of sending both query string and arguments together in the same communication without needing to hear back from the server. It’s done by sending Parse, Bind, and Execute messages in a pipeline before sending Sync, which triggers a server response. In MySQL it’s only possible (if I remember correctly) if there are no arguments, in which case one can use COM_QUERY instead of COM_PREPARE and COM_EXECUTE.

The main problem with unprepared queries for PostgreSQL is that Racket no longer sees the statement’s argument and result types. So if the arguments are the wrong type, the server catches the error instead of Racket. That’s bad for two reasons: the error you get is different from the error you get in the prepared case (and worse, IMHO), and if you’re in a transaction, it invalidates the transaction (all errors invalidate a transaction in PostgreSQL). The db library also needs the result types so it can choose between the text and binary result encodings; for most types it speaks binary, but a few (eg TIMESTAMP, NUMERIC) have complicated enough binary forms that it’s easier to just parse the text form. So unprepared queries for PostgreSQL are out. For MySQL there’s a similar issue: COM_QUERY message produces text results whereas the COM_EXECUTE message binary results. And again, it doesn’t work for queries with parameters.

For both systems it would be fine to use unprepared queries for statements that neither take arguments nor return result rows. The db library does just that for MySQL connections, but for a different reason: only some types of statements can be prepared. So unless it’s a SELECT or SHOW statement (which return rows) or it has parameters, a statement is executed unprepared. But in PostgreSQL, every statement is preparable, and the statements that are safe to execute unprepared are the ones that aren’t likely to be executed too often, like CREATE TABLE.

The alternative, given that we usually want the prepared statement information for better error behavior, is to cache implicit call to prepare. The first query still involves two trips, but subsequent queries only take one. Only DML statements are cached: SELECT, INSERT, etc.

Just one hitch—changes to the database schema may invalidate the information associated with prepared statements. A connection inspects each statement it executes to see if it is a potentially schema-changing statement such as ALTER TABLE; if so, it invalidates the statement cache. We also invalidate the cache on transactional statements; a ROLLBACK TO SAVEPOINT can undo a previous schema change in PostgreSQL.

But a schema change could also originate from another connection, so by default we only enable the statement cache inside of a transaction. The transaction isolates us from surprise externally-originating schema changes (on PostgreSQL, anyway—getting this right on MySQL is probably hopeless). The programmer should probably be able to tune how careful the statement cache is, but I haven’t added that yet.

No comments: