--- title: "Use .pluck If You Only Need a Subset of Model Attributes" date: 2014-08-20T00:00:00+00:00 draft: false canonical_url: https://www.viget.com/articles/pluck-subset-rails-activerecord-model-attributes/ --- *Despite some exciting advances in the field, like [Node](http://nodejs.org/), [Redis](http://redis.io/), and [Go](https://golang.org/), a well-structured relational database fronted by a Rails or Sinatra (or Django, etc.) app is still one of the most effective toolsets for building things for the web. In the coming weeks, I'll be publishing a series of posts about how to be sure that you're taking advantage of all your RDBMS has to offer.* IF YOU ONLY REQUIRE a few attributes from a table, rather than instantiating a collection of models and then running a `.map` over them to get the data you need, it's much more efficient to use `.pluck` to pull back only the attributes you need as an array. The benefits are twofold: better SQL performance and less time and memory spent in Rubyland. To illustrate, let's use an app I've been working on that takes [Harvest](http://www.getharvest.com/) data and generates reports. As a baseline, here is the execution time and memory usage of `rails runner` with a blank instruction: $ time rails runner "" real 0m2.053s user 0m1.666s sys 0m0.379s $ memory_profiler.sh rails runner "" Peak: 109240 In other words, it takes about two seconds and 100MB to boot up the app. We calculate memory usage with a modified version of [this Unix script](http://stackoverflow.com/a/1269490). Now, consider a TimeEntry model in our time tracking application (of which there are 314,420 in my local database). Let's say we need a list of the dates of every single time entry in the system. A naïve approach would look something like this: ```ruby dates = TimeEntry.all.map { |entry| entry.logged_on } ``` It works, but seems a little slow: $ time rails runner "TimeEntry.all.map { |entry| entry.logged_on }" real 0m14.461s user 0m12.824s sys 0m0.994s Almost 14.5 seconds. Not exactly webscale. And how about RAM usage? $ memory_profiler.sh rails runner "TimeEntry.all.map { |entry| entry.logged_on }" Peak: 1252180 About 1.25 gigabytes of RAM. Now, what if we use `.pluck` instead? ```ruby dates = TimeEntry.pluck(:logged_on) ``` In terms of time, we see major improvements: $ time rails runner "TimeEntry.pluck(:logged_on)" real 0m4.123s user 0m3.418s sys 0m0.529s So from roughly 15 seconds to about four. Similarly, for memory usage: $ memory_profiler.sh bundle exec rails runner "TimeEntry.pluck(:logged_on)" Peak: 384636 From 1.25GB to less than 400MB. When we subtract the overhead we calculated earlier, we're going from 15 seconds of execution time to two, and 1.15GB of RAM to 300MB. ## Using SQL Fragments As you might imagine, there's a lot of duplication among the dates on which time entries are logged. What if we only want unique values? We'd update our naïve approach to look like this: ```ruby dates = TimeEntry.all.map { |entry| entry.logged_on }.uniq ```` When we profile this code, we see that it performs slightly worse than the non-unique version: $ time rails runner "TimeEntry.all.map { |entry| entry.logged_on }.uniq" real 0m15.337s user 0m13.621s sys 0m1.021s $ memory_profiler.sh rails runner "TimeEntry.all.map { |entry| entry.logged_on }.uniq" Peak: 1278784 Instead, let's take advantage of `.pluck`'s ability to take a SQL fragment rather than a symbolized column name: ```ruby dates = TimeEntry.pluck("DISTINCT logged_on") ``` Profiling this code yields surprising results: $ time rails runner "TimeEntry.pluck('DISTINCT logged_on')" real 0m2.133s user 0m1.678s sys 0m0.369s $ memory_profiler.sh rails runner "TimeEntry.pluck('DISTNCT logged_on')" Peak: 107984 Both running time and memory usage are virtually identical to executing the runner with a blank command, or, in other words, the result is calculated at an incredibly low cost. ## Using `.pluck` Across Tables Requirements have changed, and now, instead of an array of timestamps, we need an array of two-element arrays consisting of the timestamp and the employee's last name, stored in the "employees" table. Our naïve approach then becomes: ```ruby dates = TimeEntry.all.map { |entry| [entry.logged_on, entry.employee.last_name] } ``` Go grab a cup of coffee, because this is going to take awhile. $ time rails runner "TimeEntry.all.map { |entry| [entry.logged_on, entry.employee.last_name] }" real 7m29.245s user 6m52.136s sys 0m15.601s memory_profiler.sh rails runner "TimeEntry.all.map { |entry| [entry.logged_on, entry.employee.last_name] }" Peak: 3052592 Yes, you're reading that correctly: 7.5 minutes and 3 gigs of RAM. We can improve performance somewhat by taking advantage of ActiveRecord's [eager loading](http://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations) capabilities. ```ruby dates = TimeEntry.includes(:employee).map { |entry| [entry.logged_on, entry.employee.last_name] } ``` Benchmarking this code, we see significant performance gains, since we're going from over 300,000 SQL queries to two. $ time rails runner "TimeEntry.includes(:employee).map { |entry| [entry.logged_on, entry.employee.last_name] }" real 0m21.270s user 0m19.396s sys 0m1.174s $ memory_profiler.sh rails runner "TimeEntry.includes(:employee).map { |entry| [entry.logged_on, entry.employee.last_name] }" Peak: 1606204 Faster (from 7.5 minutes to 21 seconds), but certainly not fast enough. Finally, with `.pluck`: ```ruby dates = TimeEntry.includes(:employee).pluck(:logged_on, :last_name) ``` Benchmarks: $ time rails runner "TimeEntry.includes(:employee).pluck(:logged_on, :last_name)" real 0m4.180s user 0m3.414s sys 0m0.543s $ memory_profiler.sh rails runner "TimeEntry.includes(:employee).pluck(:logged_on, :last_name)" Peak: 407912 A hair over 4 seconds execution time and 400MB RAM -- hardly any more expensive than without employee names. ## Conclusion - Prefer `.pluck` to instantiating a collection of ActiveRecord objects and then using `.map` to build an array of attributes. - `.pluck` can do more than simply pull back attributes on a single table: it can run SQL functions, pull attributes from joined tables, and tack on to any scope. - Whenever possible, let the database do the heavy lifting.