davideisinger.com/content/elsewhere/pluck-subset-rails-activerecord-model-attributes/index.md

---
title: "Use .pluck If You Only Need a Subset of Model Attributes"
date: 2014-08-20T00:00:00+00:00
draft: false
canonical_url: https://www.viget.com/articles/pluck-subset-rails-activerecord-model-attributes/
---

*Despite some exciting advances in the field, like
[Node](http://nodejs.org/), [Redis](http://redis.io/), and
[Go](https://golang.org/), a well-structured relational database fronted
by a Rails or Sinatra (or Django, etc.) app is still one of the most
effective toolsets for building things for the web. In the coming weeks,
I'll be publishing a series of posts about how to be sure that you're
taking advantage of all your RDBMS has to offer.*

IF YOU ONLY REQUIRE a few attributes from a table, rather than
instantiating a collection of models and then running a `.map` over them
to get the data you need, it's much more efficient to use `.pluck` to
pull back only the attributes you need as an array. The benefits are
twofold: better SQL performance and less time and memory spent in
Rubyland.

To illustrate, let's use an app I've been working on that takes
[Harvest](http://www.getharvest.com/) data and generates reports. As a
baseline, here is the execution time and memory usage of `rails runner`
with a blank instruction:

    $ time rails runner ""
    real 0m2.053s
    user 0m1.666s
    sys 0m0.379s

    $ memory_profiler.sh rails runner ""
    Peak: 109240

In other words, it takes about two seconds and 100MB to boot up the app.
We calculate memory usage with a modified version of [this Unix
script](http://stackoverflow.com/a/1269490).

Now, consider a TimeEntry model in our time tracking application (of
which there are 314,420 in my local database). Let's say we need a list
of the dates of every single time entry in the system. A naïve approach
would look something like this:

```ruby
dates = TimeEntry.all.map { |entry| entry.logged_on }
```

It works, but seems a little slow:

    $ time rails runner "TimeEntry.all.map { |entry| entry.logged_on }"
    real 0m14.461s
    user 0m12.824s
    sys 0m0.994s

Almost 14.5 seconds. Not exactly webscale. And how about RAM usage?

    $ memory_profiler.sh rails runner "TimeEntry.all.map { |entry| entry.logged_on }"
    Peak: 1252180

About 1.25 gigabytes of RAM. Now, what if we use `.pluck` instead?

```ruby
dates = TimeEntry.pluck(:logged_on)
```

In terms of time, we see major improvements:

    $ time rails runner "TimeEntry.pluck(:logged_on)"
    real 0m4.123s
    user 0m3.418s
    sys 0m0.529s

So from roughly 15 seconds to about four. Similarly, for memory usage:

    $ memory_profiler.sh bundle exec rails runner "TimeEntry.pluck(:logged_on)"
    Peak: 384636

From 1.25GB to less than 400MB. When we subtract the overhead we
calculated earlier, we're going from 15 seconds of execution time to
two, and 1.15GB of RAM to 300MB.

## Using SQL Fragments

As you might imagine, there's a lot of duplication among the dates on
which time entries are logged. What if we only want unique values? We'd
update our naïve approach to look like this:

```ruby
dates = TimeEntry.all.map { |entry| entry.logged_on }.uniq
```

When we profile this code, we see that it performs slightly worse than
the non-unique version:

    $ time rails runner "TimeEntry.all.map { |entry| entry.logged_on }.uniq"
    real 0m15.337s
    user 0m13.621s
    sys 0m1.021s

    $ memory_profiler.sh rails runner "TimeEntry.all.map { |entry| entry.logged_on }.uniq"
    Peak: 1278784

Instead, let's take advantage of `.pluck`'s ability to take a SQL
fragment rather than a symbolized column name:

```ruby
dates = TimeEntry.pluck("DISTINCT logged_on")
```

Profiling this code yields surprising results:

    $ time rails runner "TimeEntry.pluck('DISTINCT logged_on')"
    real 0m2.133s
    user 0m1.678s
    sys 0m0.369s

    $ memory_profiler.sh rails runner "TimeEntry.pluck('DISTNCT logged_on')"
    Peak: 107984

Both running time and memory usage are virtually identical to executing
the runner with a blank command, or, in other words, the result is
calculated at an incredibly low cost.

## Using `.pluck` Across Tables

Requirements have changed, and now, instead of an array of timestamps,
we need an array of two-element arrays consisting of the timestamp and
the employee's last name, stored in the "employees" table. Our naïve
approach then becomes:

```ruby
dates = TimeEntry.all.map { |entry| [entry.logged_on, entry.employee.last_name] }
```

Go grab a cup of coffee, because this is going to take awhile.

    $ time rails runner "TimeEntry.all.map { |entry| [entry.logged_on, entry.employee.last_name] }"
    real 7m29.245s
    user 6m52.136s
    sys 0m15.601s

    memory_profiler.sh rails runner "TimeEntry.all.map { |entry| [entry.logged_on, entry.employee.last_name] }"
    Peak: 3052592

Yes, you're reading that correctly: 7.5 minutes and 3 gigs of RAM. We
can improve performance somewhat by taking advantage of ActiveRecord's
[eager
loading](http://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations)
capabilities.

```ruby
dates = TimeEntry.includes(:employee).map { |entry| [entry.logged_on, entry.employee.last_name] }
```

Benchmarking this code, we see significant performance gains, since
we're going from over 300,000 SQL queries to two.

    $ time rails runner "TimeEntry.includes(:employee).map { |entry| [entry.logged_on, entry.employee.last_name] }"
    real 0m21.270s
    user 0m19.396s
    sys 0m1.174s

    $ memory_profiler.sh rails runner "TimeEntry.includes(:employee).map { |entry| [entry.logged_on, entry.employee.last_name] }"
    Peak: 1606204

Faster (from 7.5 minutes to 21 seconds), but certainly not fast enough.
Finally, with `.pluck`:

```ruby
dates = TimeEntry.includes(:employee).pluck(:logged_on, :last_name)
```

Benchmarks:

    $ time rails runner "TimeEntry.includes(:employee).pluck(:logged_on, :last_name)"
    real 0m4.180s
    user 0m3.414s
    sys 0m0.543s

    $ memory_profiler.sh rails runner "TimeEntry.includes(:employee).pluck(:logged_on, :last_name)"
    Peak: 407912

A hair over 4 seconds execution time and 400MB RAM -- hardly any more
expensive than without employee names.

## Conclusion

-   Prefer `.pluck` to instantiating a collection of ActiveRecord
    objects and then using `.map` to build an array of attributes.

-   `.pluck` can do more than simply pull back attributes on a single
    table: it can run SQL functions, pull attributes from joined tables,
    and tack on to any scope.

-   Whenever possible, let the database do the heavy lifting.