The article "Ruby Garbage Collection: Still Not Ready for Production" has been making the rounds.

In it we learned that our GC algorithm is flawed and were prescribed some rather drastic and dangerous workarounds.

At the core it had one big demonstration:

Run this on Ruby 2.1.1 and you will be out of memory soon:

while true
  "a" * (1024 ** 2)
end

Malloc limits, Ruby and you

From very early versions of Ruby we always tracked memory allocation. This is why I found FUD comments such as this troubling:

the issue is that the Ruby GC is triggered on total number of objects, and not total amount of used memory

This is clearly misunderstanding Ruby. In fact, the aforementioned article does nothing to mention memory allocation may trigger a GC.

Historically Ruby was quite conservative issuing GCs based on the amount of memory allocated. Ruby keeps track of all memory allocated (using malloc) outside of the Ruby heaps between GCs. In Ruby 2.0, out-of-the-box every 8MB of allocations will result in a full GC. This number is way too small for almost any Rails app, which is why increasing RUBY_GC_MALLOC_LIMIT is one of the most cargo culted settings out there in the wild.

Matz picked this tiny number years ago when it was a reasonable default, however it was not revised till Ruby 2.1 landed.

For Ruby 2.1 Koichi decided to revamp this sub-system. The goal was to have defaults that work well for both scripts and web apps.

Instead of having a single malloc limit for our app, we now have a starting point malloc limit that will dynamically grow every time we trigger a GC by exceeding the limit. To stop unbound growth of the limit we have max values set.

We track memory allocations from 2 points in time:

  • memory allocated outside Ruby heaps since last minor GC
  • memory allocated since last major GC.

At any point in time we can get a snapshot of the current situation with GC.stat:

> GC.stat
=> {:count=>25,
 :heap_used=>263,
 :heap_length=>406,
 :heap_increment=>143,
 :heap_live_slot=>106806,
 :heap_free_slot=>398,
 :heap_final_slot=>0,
 :heap_swept_slot=>25258,
 :heap_eden_page_length=>263,
 :heap_tomb_page_length=>0,
 :total_allocated_object=>620998,
 :total_freed_object=>514192,
 :malloc_increase=>1572992,
 :malloc_limit=>16777216,
 :minor_gc_count=>21,
 :major_gc_count=>4,
 :remembered_shady_object=>1233,
 :remembered_shady_object_limit=>1694,
 :old_object=>65229,
 :old_object_limit=>93260,
 :oldmalloc_increase=>2298872,
 :oldmalloc_limit=>16777216}

malloc_increase denotes the amount of memory we allocated since the last minor GC. oldmalloc_increase the amount since last major GC.

We can tune our settings, from "Ruby 2.1 Out-of-Band GC":

RUBY_GC_MALLOC_LIMIT: (default: 16MB)
RUBY_GC_MALLOC_LIMIT_MAX: (default: 32MB)
RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR: (default: 1.4x)

and

RUBY_GC_OLDMALLOC_LIMIT: (default: 16MB)
RUBY_GC_OLDMALLOC_LIMIT_MAX: (default: 128MB)
RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR: (default: 1.2x)

So, in theory, this unbound memory growth is not possible for the script above. The two MAX values should just cap the growth and force GCs.

However, this is not the case in Ruby 2.1.1

Investigating the issue

We spent a lot of time ensuring we had extensive instrumentation built in to Ruby 2.1, we added memory profiling hooks, we added GC hooks, we exposed a large amount of internal information. This has certainly paid off.

Analyzing the issue raised by this mini script is trivial using the gc_tracer gem. This gem allows us to get a very detailed snapshot of the system every time a GC is triggered and store it in a text file, easily consumable by spreadsheet.

We simply add this to the rogue script:

require 'gc_tracer'
GC::Tracer.start_logging("log.txt")

And get a very detailed trace back in the text file:

In the snippet above we can see minor GCs being triggered by exceeding malloc limits (where major_by is 0) and major GCs being triggered by exceeding malloc limits. We can see out malloc limit and old malloc limit growing. We can see when GC starts and ends, and lots more.

Trouble is, our limit max for both oldmalloc and malloc grows well beyond the max values we have defined:

So, bottom line is, looks like we have a straight out bug.

https://bugs.ruby-lang.org/issues/9687

I one line bug, that will be patched in Ruby 2.1.2 and is already fixed in master.

Are you affected by this bug?

It is possible your production app on Ruby 2.1.1 is impacted by this. Simplest way to find out is to issue a GC.stat as soon as memory usage is really high.

The script above is very aggressive and triggers the pathological issue, it is quite possibly you are not even pushing against malloc limits. Only way to find out is measure.

General memory growth under Ruby 2.1.1

A more complicated issue we need to tackle is the more common "memory doubling" issue under Ruby 2.1.1. The general complaint goes something along the line of "I just upgraded Ruby and now my RSS has doubled"

This issue is described in details here: https://bugs.ruby-lang.org/issues/9607

Memory usage growth is partly unavoidable when employing a generational GC. A certain section of the heap is getting scanned far less often. It's a performance/memory trade-off. That said, the algorithm used in 2.1 is a bit too simplistic.

If ever an objects survives a minor GC it will be flagged as oldgen, these objects will only be scanned during a major GC. This algorithm is particularly problematic for web applications.

Web applications perform a large amount of "medium" lived memory allocations. A large number of objects are needed for the lifetime of a web request. If a minor GC hits in the middle of a web request we will "promote" a bunch of objects to the "long lived" oldgen even though they will no longer be needed at the end of the request.

This has a few bad side effects,

  1. It forces major GC to run more often (growth of oldgen is a trigger for running a major GC)
  2. It forces the oldgen heaps to grow beyond what we need.
  3. A bunch of memory is retained when it is clearly not needed.

.NET and Java employ 3 generations to overcome this issue. Survivors in Gen 0 collections are promoted to Gen 1 and so on.

Koichi is planning on refining the current algorithm to employ a somewhat similar technique of deferred promotion. Instead of promoting objects to oldgen on first minor GC and object will have to survive two minor GCs to be promoted. This means that if no more than 1 minor GC runs during a request our heaps will be able to stay at optimal sizes. This work is already prototyped into Ruby 2.1 see RGENGC_THREEGEN in gc.c (note, the name is likely to change). This is slotted to be released in Ruby 2.2

We can see this problem in action using this somewhat simplistic test:

@retained = []
@rand = Random.new(999)

MAX_STRING_SIZE = 100

def stress(allocate_count, retain_count, chunk_size)
  chunk = []
  while retain_count > 0 || allocate_count > 0
    if retain_count == 0 || (@rand.rand < 0.5 && allocate_count > 0)
      chunk << " " * (@rand.rand * MAX_STRING_SIZE).to_i
      allocate_count -= 1
      if chunk.length > chunk_size
        chunk = []
      end
    else
      @retained << " " * (@rand.rand * MAX_STRING_SIZE).to_i
      retain_count -= 1
    end
  end
end

start = Time.now
# simulate rails boot, 2M objects allocated 600K retained in memory
stress(2_000_000, 600_000, 200_000)

# simulate 100 requests that allocate 100K objects
stress(10_000_000, 0, 100_000)


puts "Duration: #{(Time.now - start).to_f}"

puts "RSS: #{`ps -eo rss,pid | grep #{Process.pid} | grep -v grep | awk '{ print $1;  }'`}"

In Ruby 2.0 we get:

% ruby stress.rb
Duration: 10.074556277
RSS: 122784

In Ruby 2.1.1 we get:

% ruby stress.rb
Duration: 7.031792076
RSS: 236244

Performance has improved, but memory almost doubled.

To mitigate the current pain point we can use the new RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR environment var.

Out of the box we trigger a major gc if our oldobject count doubles. We can tune this down to say 1.3 times and see a significant improvement memory wise:

% RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.3 ruby stress.rb
Duration: 6.85115156
RSS: 184928

On memory constrained machines we can go even further and disable generational GC altogether.

% RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.9 ruby stress.rb 
Duration: 6.759709765
RSS: 149728

We can always add jemalloc for good measure to shave off an extra 10% percent or so:

LD_PRELOAD=/home/sam/Source/jemalloc-3.5.0/lib/libjemalloc.so RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.9 ruby stress.rb 
Duration: 6.204024629
RSS: 144440

If that is still not enough you can push malloc limits down (and have more GCs run due to hitting it)

% RUBY_GC_MALLOC_LIMIT_MAX=8000000 RUBY_GC_OLDMALLOC_LIMIT_MAX=8000000  LD_PRELOAD=/home/sam/Source/jemalloc-3.5.0/lib/libjemalloc.so RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.9 ruby stress.rb
Duration: 9.02354988
RSS: 120668

Which is nice since we are back to Ruby 2.0 numbers now and lost a pile of performance.

Ruby 2.1 is ready for production

Ruby 2.1 has been running in production at GitHub for a few months with great success. The 2.1.0 release was a little rough 2.1.1 addresses the majority of the big issues it had. 2.1.2 will address the malloc issue, which may or may not affect you.

If you are considering deploying Ruby 2.1 I would strongly urge giving GitHub Ruby a go since it contains a fairly drastic performance boost due to funny-falcons excellent method cache patch.

Performance has much improved at the cost of memory, that said you can tune memory as needed and measure impact of various settings effectively.

Summary

  • If you discover any issues, please report them on https://bugs.ruby-lang.org/
  • Use Ruby 2.1.1 in production, upgrade to 2.1.2 as soon as it is released
  • Be sure to look at jemalloc and GC tuning for memory constrained systems. See also: https://bugs.ruby-lang.org/issues/9113
  • Always be measuring. If you are seeing issues run GC.stat, you can attach to the rogue process using rbtrace a gem you should consider including on production systems.

Resources:

Comments

Paul Kmiec 88 days ago
Paul Kmiec

Is there a way to install the GitHub's Ruby (https://github.com/github/ruby) via RVM?

Erick Guan 88 days ago
Erick Guan

Sam, what do you think of Rubinius? Is that a great alternative to MRI with better performance?

Sam Saffron 84 days ago
Sam Saffron

@paul_kmiec not really, but you could quite easily use rbenv with a custom definition file

@fantasticfears I think rubinius is awesome but have not had any luck running Discourse on it.

Paul Kmiec 84 days ago
Paul Kmiec

Turns out you can install via RVM. See https://github.com/wayneeseguin/rvm/issues/2814.

Julien Palmas 59 days ago
Julien Palmas

@sam I can still reproduce the stress.rb bug in 2.1.2, but your article says the contrary.

Do you confirm it has not been fixed in 2.1.2 ?

Sam Saffron 59 days ago
Sam Saffron

What issue, the memory leak, I am unable to reproduce it here:

irb
irb(main):001:0> while true; "a" * (1024 ** 2); end

Memory is stable:

sam@ubuntu discourse % ps aux | grep irb | grep -v grep
sam      26338 72.3  2.1 164672 131616 pts/3   Rl+  12:33   1:01 irb

Are you sure you have Ruby 2.1.2 installed and selected properly?

Julien Palmas 58 days ago
Julien Palmas

my mistake @sam, the memory leak is indeed solved with 2.1.2

I was reffering to the "General memory growth under Ruby 2.1.1" part of your article.

This should be solved with 2.2 if I've understood your article correctly.

I am deploying a rails api app under a 1X dyno on heroku and have 1 big endpoint, generating many objects, that's being hit too often and my memory goes over the 512 MB allowed triggering many R14 heroku errors and perf issues.

I've set RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR to 0.9 as you advised and this seem to be better but I still outgrow the 512MB limits.

My tests with jemalloc on my mac are pretty encouraging but using jemalloc on heroku requires some setup.

In any case, thank you for your article!

Sam Saffron 56 days ago
Sam Saffron

The main difference though is the malloc limit is way more restricted on 2.0, try pushing it down if you are super memory constrained.

Justin Gordon 18 days ago
Justin Gordon

Any idea if these settings for older rubies are still valid for 2.1.2?

export RUBY_GC_HEAP_INIT_SLOTS=1000000
export RUBY_HEAP_SLOTS_INCREMENT=1000000
export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
export RUBY_GC_MALLOC_LIMIT=1000000000
export RUBY_HEAP_FREE_MIN=500000

comments powered by Discourse