While attending Hacker School in the fall, I found myself utilizing SQL on a much more regular basis than I'd expected. My only previous experience with SQL was querying against fictitious databases in my college classes, so it was refreshing to use a database for “real-world” stuff. I helped Sumana update her Missing From Wikipedia application so that it queried against live replicas of Wikipedia databases (something that anyone can do via Wikimedia Labs).

One of the first questions we had was whether we needed to use ORM. Object/relational mapping gives two incompatible type systems – an object-oriented model and a relational database – the ability to communicate more fluently. For example, SQLAlchemy's ORM component allows Python developers to directly map information pulled from database to classes. However, Missing From Wikipedia doesn't need to load entries into object-oriented data structures – it simply needs to check to see if a given entry exists. Using something like SQLAlchemy would have probably been overkill for our purposes.

Instead, Sumana and I used simple SQL queries to to determine whether a particular row a) exists and b) fulfills particular criteria. By using “SELECT exists” to wrap what would have been our base queries, we created calls that returned booleans:

select exists (select page\_id from page
where page\_title='MariaDB' AND page\_namespace=0)

>>1

(Results exist.)

select exists (select page\_id from page
where page\_title = 'MariaDBBBBBBBBBBBB' AND page\_namespace=0)

>>0

(No results exist.)

During this initial investigation, we found that Sumana's old method of hitting the web API and this new method of making a SQL query were comparable in speed. Some friendly folks in the IRC channel informed us that that “page_namespace” and “page_titles” were indexed via a composite index, but we found that we actually wanted to query on the “page” and “page_title” columns.

While the SQL calls were not optimized in any way, I was surprised that there was not at least a small jump in efficiency that came from hitting the SQL database directly. Regardless, the Wikimedia Labs replica databases still give developers an amazing amount of power and flexibility in creating applications that integrate Wikipedia data, and I'm certain I'll revisit them in the future.