An analysis of memory bloat in Active Record 5.2
over 6 years ago
One of the very noble goals the Ruby community which is being spearheaded by Matz is the Ruby 3x3 plan. The idea is that using large amounts of modern optimizations we can make Ruby the interpreter 3 times faster. It is an ambitious goal, which is notable and inspiring. This “movement” has triggered quite a lot of interesting experiments in Ruby core, including a just-in-time compiler and action around reducing memory bloat out-of-the-box. If Ruby gets faster and uses less memory, then everyone gets free performance, which is exactly what we all want.
A big problem though is that there is only so much magic a faster Ruby can achieve. A faster Ruby is not going to magically fix a “bubble sort” hiding deep in your code. Active Record has tons of internal waste that ought to be addressed which could lead to the vast majority of Ruby applications in the wild getting a lot faster. Rails is the largest consumer of Ruby after all and Rails is underpinned by Active Record.
Sadly, Active Record performance has not gotten much better since the days of Rails 2, in fact in quite a few cases it got slower or a lot slower.
Active Record is very wasteful
I would like to start off with a tiny example:
Say I have a typical 30 column table containing Topics.
If I run the following, how much will Active Record allocate?
a = []
Topic.limit(1000).each do |u|
a << u.id
end
Total allocated: 3835288 bytes (26259 objects)
Compare this to an equally inefficient “raw version”.
sql = -"select * from topics limit 1000"
ActiveRecord::Base.connection.raw_connection.async_exec(sql).column_values(0)
Total allocated: 8200 bytes (4 objects)
This amount of waste is staggering, it translates to deadly combo:
- Extreme levels of memory usage
and
- Slower performance
But … that is really bad Active Record!
An immediate gut reaction here is that I am “cheating” and writing “slow” Active Record code, and comparing it to mega optimized raw code.
One could argue that I should write:
a = []
Topic.select(:id).limit(1000).each do |u|
a << u.id
end
In which you would get:
Total allocated: 1109357 bytes (11097 objects)
Or better still:
Topic.limit(1000).pluck(:id)
In which I would get
Total allocated: 221493 bytes (5098 objects)
Time for a quick recap.
-
The “raw” version allocated 4 objects, it was able to return 1000 Integers directly which are not allocated indevidually in the Ruby heaps and are not subject to garbage collection slots.
-
The “naive” Active Record version allocates 26259 objects
-
The “slightly optimised” Active Record version allocates 11097 objects
-
The “very optimised” Active Record version allocates 5098 objects
All of those numbers are orders of magnitude larger than 4.
How many objects does a “naive/lazy” implementation need to allocate?
One feature that Active Record touts as a huge advantage over Sequel is the “built-in” laziness.
ActiveRecord will not bother “casting” a column to a date till you try to use it, so if for any reason you over select ActiveRecord has your back. This deficiency in Sequel is acknowledged and deliberate:
This particular niggle makes it incredibly hard to move to Sequel from ActiveRecord without extremely careful review, despite Sequel being so incredibly fast and efficient.
We have no “fastest” example out there of an efficient lazy selector. In our case we are consuming 1000 ids so we would expect the mega efficient implementation to allocate 1020 or so objects cause we can not get away without allocating a Topic
object. We do not expect 26 thousand.
Here is a quick attempt at such an implementation: (note this is just proof of concept of the idea, not a production level system)
$conn = ActiveRecord::Base.connection.raw_connection
class FastBase
class Relation
include Enumerable
def initialize(table)
@table = table
end
def limit(limit)
@limit = limit
self
end
def to_sql
sql = +"SELECT #{@table.columns.join(',')} from #{@table.get_table_name}"
if @limit
sql << -" LIMIT #{@limit}"
end
sql
end
def each
@results = $conn.async_exec(to_sql)
i = 0
while i < @results.cmd_tuples
row = @table.new
row.attach(@results, i)
yield row
i += 1
end
end
end
def self.columns
@columns
end
def attach(recordset, row_number)
@recordset = recordset
@row_number = row_number
end
def self.get_table_name
@table_name
end
def self.table_name(val)
@table_name = val
load_columns
end
def self.load_columns
@columns = $conn.async_exec(<<~SQL).column_values(0)
SELECT COLUMN_NAME FROM information_schema.columns
WHERE table_schema = 'public' AND
table_name = '#{@table_name}'
SQL
@columns.each_with_index do |name, idx|
class_eval <<~RUBY
def #{name}
if @recordset && !@loaded_#{name}
@loaded_#{name} = true
@#{name} = @recordset.getvalue(@row_number, #{idx})
end
@#{name}
end
def #{name}=(val)
@loaded_#{name} = true
@#{name} = val
end
RUBY
end
end
def self.limit(number)
Relation.new(self).limit(number)
end
end
class Topic2 < FastBase
table_name :topics
end
Then we can measure:
a = []
Topic2.limit(1000).each do |t|
a << t.id
end
a
Total allocated: 84320 bytes (1012 objects)
So … we can manage a similar API with 1012 object allocations as opposed to 26 thousand objects.
Does this matter?
A quick benchmark shows us:
Calculating -------------------------------------
magic 256.149 (± 2.3%) i/s - 1.300k in 5.078356s
ar 75.219 (± 2.7%) i/s - 378.000 in 5.030557s
ar_select 196.601 (± 3.1%) i/s - 988.000 in 5.030515s
ar_pluck 1.407k (± 4.5%) i/s - 7.050k in 5.020227s
raw 3.275k (± 6.2%) i/s - 16.450k in 5.043383s
raw_all 284.419 (± 3.5%) i/s - 1.421k in 5.002106s
Our new implementation (that I call magic) does 256 iterations a second compared to Rails 75. It is a considerable improvement over the Rails implementation on multiple counts. It is both much faster and allocates significantly less memory leading to reduced process memory usage. This is despite following the non-ideal practice of over selection. In fact our implementation is so fast, it even beats Rails when it is careful only to select 1 column!
This is the Rails 3x3 we could have today with no changes to Ruby!
Another interesting data point is how much slower pluck
, the turbo boosted version Rails has to offer, is slower that raw SQL. In fact, at Discourse, we monkey patch pluck exactly for this reason. (I also have a Rails 5.2 version)
Why is this bloat happening?
Looking at memory profiles I can see multiple reasons all this bloat happens:
-
Rails is only sort-of-lazy… I can see 1000s of string allocations for columns we never look at. It is not “lazy-allocating” it is partial “lazy-casting”
-
Every row allocates 3 additional objects for bookeeping and magic.
ActiveModel::Attribute::FromDatabase
,ActiveModel::AttributeSet
,ActiveModel::LazyAttributeHash
. None of this is required and instead a single array could be passed around that holds indexes to columns in the result set. -
Rails insists on dispatching casts to helper objects even if the data retrieved is already in “the right format” (eg a number) this work generates extra bookkeeping
-
Every column name we have is allocated twice per query, this stuff could easily be cached and reused (if the query builder is aware of the column names it selected it does not need to ask the result set for them)
What should to be done?
I feel that we need to carefully review Active Record internals and consider an implementation that allocates significantly less objects per row. We also should start leveraging the PG gem’s native type casting to avoid pulling strings out of the database only to convert them back to numbers.
You can see the script I used for this evaluation over here:
Thanks for a very informative blog post!
I’ve extended the benchmark with Sequel:
Sequel additions
And these were the results:
So, what Jeremy Evans said in this blog post – “Active Record optimizes for inefficient queries and Sequel optimizes for efficient” – is really true, in both ways. Without SELECT, Active Record is much faster and allocates much less memory than Active Record. But if we do SELECT only the
id
, then Sequel allocates much less memory and performs much faster than Active Record.If we also add the
sequel_pg
gem, that bumps up Sequel performance significantly:In this case Sequel allocates less memory than Active Record even when all columns are selected, despite Active Record’s lazy loading. So memory-wise maybe it’s not so dangerous to move to Sequel after all