Ruby 2.1 Garbage Collection: ready for production
over 10 years ago
The article “Ruby Garbage Collection: Still Not Ready for Production” has been making the rounds.
In it we learned that our GC algorithm is flawed and were prescribed some rather drastic and dangerous workarounds.
At the core it had one big demonstration:
Run this on Ruby 2.1.1 and you will be out of memory soon:
while true
"a" * (1024 ** 2)
end
Malloc limits, Ruby and you
From very early versions of Ruby we always tracked memory allocation. This is why I found FUD comments such as this troubling:
the issue is that the Ruby GC is triggered on total number of objects, and not total amount of used memory
This is clearly misunderstanding Ruby. In fact, the aforementioned article does nothing to mention memory allocation may trigger a GC.
Historically Ruby was quite conservative issuing GCs based on the amount of memory allocated. Ruby keeps track of all memory allocated (using malloc) outside of the Ruby heaps between GCs. In Ruby 2.0, out-of-the-box every 8MB of allocations will result in a full GC. This number is way too small for almost any Rails app, which is why increasing RUBY_GC_MALLOC_LIMIT
is one of the most cargo culted settings out there in the wild.
Matz picked this tiny number years ago when it was a reasonable default, however it was not revised till Ruby 2.1 landed.
For Ruby 2.1 Koichi decided to revamp this sub-system. The goal was to have defaults that work well for both scripts and web apps.
Instead of having a single malloc limit for our app, we now have a starting point malloc limit that will dynamically grow every time we trigger a GC by exceeding the limit. To stop unbound growth of the limit we have max values set.
We track memory allocations from 2 points in time:
- memory allocated outside Ruby heaps since last minor GC
- memory allocated since last major GC.
At any point in time we can get a snapshot of the current situation with GC.stat:
> GC.stat
=> {:count=>25,
:heap_used=>263,
:heap_length=>406,
:heap_increment=>143,
:heap_live_slot=>106806,
:heap_free_slot=>398,
:heap_final_slot=>0,
:heap_swept_slot=>25258,
:heap_eden_page_length=>263,
:heap_tomb_page_length=>0,
:total_allocated_object=>620998,
:total_freed_object=>514192,
:malloc_increase=>1572992,
:malloc_limit=>16777216,
:minor_gc_count=>21,
:major_gc_count=>4,
:remembered_shady_object=>1233,
:remembered_shady_object_limit=>1694,
:old_object=>65229,
:old_object_limit=>93260,
:oldmalloc_increase=>2298872,
:oldmalloc_limit=>16777216}
malloc_increase
denotes the amount of memory we allocated since the last minor GC. oldmalloc_increase
the amount since last major GC.
We can tune our settings, from “Ruby 2.1 Out-of-Band GC”:
RUBY_GC_MALLOC_LIMIT: (default: 16MB)
RUBY_GC_MALLOC_LIMIT_MAX: (default: 32MB)
RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR: (default: 1.4x)
and
RUBY_GC_OLDMALLOC_LIMIT: (default: 16MB)
RUBY_GC_OLDMALLOC_LIMIT_MAX: (default: 128MB)
RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR: (default: 1.2x)
So, in theory, this unbound memory growth is not possible for the script above. The two MAX
values should just cap the growth and force GCs.
However, this is not the case in Ruby 2.1.1
Investigating the issue
We spent a lot of time ensuring we had extensive instrumentation built in to Ruby 2.1, we added memory profiling hooks, we added GC hooks, we exposed a large amount of internal information. This has certainly paid off.
Analyzing the issue raised by this mini script is trivial using the gc_tracer gem. This gem allows us to get a very detailed snapshot of the system every time a GC is triggered and store it in a text file, easily consumable by spreadsheet.
We simply add this to the rogue script:
require 'gc_tracer'
GC::Tracer.start_logging("log.txt")
And get a very detailed trace back in the text file:
In the snippet above we can see minor GCs being triggered by exceeding malloc limits (where major_by is 0) and major GCs being triggered by exceeding malloc limits. We can see out malloc limit and old malloc limit growing. We can see when GC starts and ends, and lots more.
Trouble is, our limit max
for both oldmalloc and malloc grows well beyond the max values we have defined:
So, bottom line is, looks like we have a straight out bug.
I one line bug, that will be patched in Ruby 2.1.2 and is already fixed in master.
Are you affected by this bug?
It is possible your production app on Ruby 2.1.1 is impacted by this. Simplest way to find out is to issue a GC.stat
as soon as memory usage is really high.
The script above is very aggressive and triggers the pathological issue, it is quite possibly you are not even pushing against malloc limits. Only way to find out is measure.
General memory growth under Ruby 2.1.1
A more complicated issue we need to tackle is the more common “memory doubling” issue under Ruby 2.1.1. The general complaint goes something along the line of “I just upgraded Ruby and now my RSS has doubled”
This issue is described in details here: Bug #9607: Change the full GC timing - Ruby master - Ruby Issue Tracking System
Memory usage growth is partly unavoidable when employing a generational GC. A certain section of the heap is getting scanned far less often. It’s a performance/memory trade-off. That said, the algorithm used in 2.1 is a bit too simplistic.
If ever an objects survives a minor GC it will be flagged as oldgen, these objects will only be scanned during a major GC. This algorithm is particularly problematic for web applications.
Web applications perform a large amount of “medium” lived memory allocations. A large number of objects are needed for the lifetime of a web request. If a minor GC hits in the middle of a web request we will “promote” a bunch of objects to the “long lived” oldgen even though they will no longer be needed at the end of the request.
This has a few bad side effects,
- It forces major GC to run more often (growth of oldgen is a trigger for running a major GC)
- It forces the oldgen heaps to grow beyond what we need.
- A bunch of memory is retained when it is clearly not needed.
.NET and Java employ 3 generations to overcome this issue. Survivors in Gen 0 collections are promoted to Gen 1 and so on.
Koichi is planning on refining the current algorithm to employ a somewhat similar technique of deferred promotion. Instead of promoting objects to oldgen on first minor GC and object will have to survive two minor GCs to be promoted. This means that if no more than 1 minor GC runs during a request our heaps will be able to stay at optimal sizes. This work is already prototyped into Ruby 2.1 see RGENGC_THREEGEN
in gc.c (note, the name is likely to change). This is slotted to be released in Ruby 2.2
We can see this problem in action using this somewhat simplistic test:
@retained = []
@rand = Random.new(999)
MAX_STRING_SIZE = 100
def stress(allocate_count, retain_count, chunk_size)
chunk = []
while retain_count > 0 || allocate_count > 0
if retain_count == 0 || (@rand.rand < 0.5 && allocate_count > 0)
chunk << " " * (@rand.rand * MAX_STRING_SIZE).to_i
allocate_count -= 1
if chunk.length > chunk_size
chunk = []
end
else
@retained << " " * (@rand.rand * MAX_STRING_SIZE).to_i
retain_count -= 1
end
end
end
start = Time.now
# simulate rails boot, 2M objects allocated 600K retained in memory
stress(2_000_000, 600_000, 200_000)
# simulate 100 requests that allocate 100K objects
stress(10_000_000, 0, 100_000)
puts "Duration: #{(Time.now - start).to_f}"
puts "RSS: #{`ps -eo rss,pid | grep #{Process.pid} | grep -v grep | awk '{ print $1; }'`}"
In Ruby 2.0 we get:
% ruby stress.rb
Duration: 10.074556277
RSS: 122784
In Ruby 2.1.1 we get:
% ruby stress.rb
Duration: 7.031792076
RSS: 236244
Performance has improved, but memory almost doubled.
To mitigate the current pain point we can use the new RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR
environment var.
Out of the box we trigger a major gc if our oldobject count doubles. We can tune this down to say 1.3
times and see a significant improvement memory wise:
% RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.3 ruby stress.rb
Duration: 6.85115156
RSS: 184928
On memory constrained machines we can go even further and disable generational GC altogether.
% RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.9 ruby stress.rb
Duration: 6.759709765
RSS: 149728
We can always add jemalloc for good measure to shave off an extra 10% percent or so:
LD_PRELOAD=/home/sam/Source/jemalloc-3.5.0/lib/libjemalloc.so RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.9 ruby stress.rb
Duration: 6.204024629
RSS: 144440
If that is still not enough you can push malloc limits down (and have more GCs run due to hitting it)
% RUBY_GC_MALLOC_LIMIT_MAX=8000000 RUBY_GC_OLDMALLOC_LIMIT_MAX=8000000 LD_PRELOAD=/home/sam/Source/jemalloc-3.5.0/lib/libjemalloc.so RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.9 ruby stress.rb
Duration: 9.02354988
RSS: 120668
Which is nice since we are back to Ruby 2.0 numbers now and lost a pile of performance.
Ruby 2.1 is ready for production
Ruby 2.1 has been running in production at GitHub for a few months with great success. The 2.1.0 release was a little rough 2.1.1 addresses the majority of the big issues it had. 2.1.2 will address the malloc issue, which may or may not affect you.
If you are considering deploying Ruby 2.1 I would strongly urge giving GitHub Ruby a go since it contains a fairly drastic performance boost due to funny-falcons excellent method cache patch.
Performance has much improved at the cost of memory, that said you can tune memory as needed and measure impact of various settings effectively.
Summary
- If you discover any issues, please report them on https://bugs.ruby-lang.org/
- Use Ruby 2.1.1 in production, upgrade to 2.1.2 as soon as it is released
- Be sure to look at jemalloc and GC tuning for memory constrained systems. See also: Feature #9113: Ship Ruby for Linux with jemalloc out-of-the-box - Ruby master - Ruby Issue Tracking System
- Always be measuring. If you are seeing issues run GC.stat, you can attach to the rogue process using rbtrace a gem you should consider including on production systems.
Resources:
-
gc_tracer gem to analyze your GC
-
Aman’s blog: http://tmm1.net/
-
Koichi Sasada’s slides: Activity list of Koichi Sasada in particular “Memory Management Tuning in Ruby”
-
The memory_profiler gem: GitHub - SamSaffron/memory_profiler: memory_profiler for ruby
-
Demystifying the Ruby GC: Demystifying the Ruby GC
Is there a way to install the GitHub’s Ruby (GitHub - github/ruby: development fork of ruby/ruby) via RVM?