ActiveRecord is one of the features that makes Ruby on Rails look So Darn Easy™. The ability to compose complex database queries without having to write any of that nasty SQL is a huge draw to new developers, most of whom have only a cursory knowledge of the database dark arts. You create a query using familiar Ruby syntax, and you get back Ruby model objects that you can iterate through in your code. Everyone’s happy.
Until the query is returning over half a million records, that is. Then things get a little more complex. And by “complex”, I mean “slower than molasses in January.”
I ran into this recently on a client project, where we were experiencing serious memory bloat when we tried to deliver messages to a large subscription list. As the list grew in size, the time to traverse the list ballooned. On average, when the list topped 700,000 members, it was taking us about an hour to iterate through the list - and the creation of 700,000 ActiveRecord model objects was causing all sorts of memory contention problem.
We knew that changing the query to return a nested array rather than model objects should speed things up, but we faced another problem. The business logic in the
message.fetch_recipients method was extremely complex - far too complex to build raw SQL. (Besides which, who wants to deal with raw SQL strings in their app? No thanks!) We could have used a tool like Arel to build the query for us, but it turns out that ActiveRecord - the source of our misery - was also the source of our deliverance.
One useful tool on
ActiveRecord::Relation is the
to_sql method, which returns the SQL statement that the database adapter will execute against the database to retrieve the results. (This is a great tool to “check your work” in the Rails console when you’re building a complex ActiveRecord query, by the way.). Normally, ActiveRecord will take the results of that SQL query and stuff the results into ActiveRecord objects, but that was exactly what we wanted to avoid here.
Fortunately, ActiveRecord gives us another tool that lets us run raw SQL against the database:
ActiveRecord::Base.connection.execute. This is usually discouraged because the SQL you run must be in the dialect of the database server you’re using (mySQL, PostgreSQL, etc.). But since we’re using
to_sql, that work is done for us automatically by the database adapter itself!
So by changing
recipients = message.fetch_recipients recipients.find_each do |msg| ...
sql = message.fetch_recipients.to_sql recipients = ActiveRecord::Base.connection.execute(sql) recipients.each do |msg| ...
we completely bypassed the process of instantiating and populating (and later garbage-collecting) over half a million objects. Yes, we had to change some code to deal with the fact that
msg is now an array of values1 instead of a nice model object, but we believed that the performance boost would be worth it.
And we were right. With this change alone, the time it took to deliver a message to all 700,000 subscribers dropped from an average of 60 minutes to an average of just over 12 minutes - a 5x performance boost.
This solution wouldn’t be worth it if the size of the resultset returned from the query is small. But if performance matters, and you are dealing with record counts in the hundreds of thousands or higher, you may want to consider this approach.
Originally posted on the Sql For Developers blog.
- Technically, what you get back from the adapter is not an array of arrays, but an object defined by the adapter that includes the Enumerable module. In this particular project, we were using a mySql database, so the SQL call returns an instance of Mysql2::Result. But since it’s basically an Enumerable, when you loop through it, you get an array of values for each row. The net effect is the same. [return]