Demystifying the Ruby GC

almost 11 years ago

This article is about the Ruby GC. In particular it is about the GC present in Ruby MRI 2.0.

The Ruby GC has been through quite a few iterations, in 1.9.3 we were introduced to the lazy sweeping algorithm and in 2.0 we were introduced bitmap marking. Ruby 2.1 is going to introduce many more concepts and is out-of-scope for this post.

Heaps of heaps

MRI (Matz’s ruby interpreter) stores objects aka. RVALUEs in heaps, each heap is approx 16KB. RVALUE structs consume different amounts of memory depending on the machine architecture. On x64 machines they consume 40 bytes, on x32 machines they consume 20 to 24 bytes depending on the sub-architecture (some optimizations shave off a few extra bytes on say, cygwin using magic pragmas).

An RVALUE is a magical c struct that is a union of various “low level” c representations of Ruby objects. For example, in MRI, an RVALUE can accessed as a RRegexp or a RString or an RObject and so on. I strongly recommend the excellent Ruby Under a Microscope to get a handle of this, GC algorithms and MRI in general.

Given this, each heap in a x64 machine we are able to store about 409 Ruby objects give or take a few for heap alignment and headers.

[1] pry(main)> require 'objspace'
=> true
[2] pry(main)> ObjectSpace.count_objects[:TOTAL] / GC.stat[:heap_used]
=> 406

A typical Rails application (like say Discourse) will have about 400 thousand objects in 1100 or so heaps (heaps can get fragmented with empty space), we can see this by running:

$ RAILS_ENV=production rails c
> GC.start
> GC.stat
=> {:count=>102, :heap_used=>1160, :heap_length=>1648, :heap_increment=>488, :heap_live_num=>369669, :heap_free_num=>102447, :heap_final_num=>0, :total_allocated_object=>3365152, :total_freed_object=>2995483}

About GC.stat in Ruby 2.0

GC.stat is a goldmine of information, it is the first place you should go to before doing any GC tuning, here is an overview of what they mean, unfortunately it is not documented some attempts are not that accurate, here is my go at it, after reading the GC source:

count: the number of times a GC ran (both full GC and lazy sweep are included)

heap_used: the number of heaps that have more than 0 slots used in them. The larger this number, the slower your GC will be.

heap_length: the total number of heaps allocated in memory. For example 1648 means - about 25.75MB is allocated to Ruby heaps. (1648 * (2 << 13)).to_f / (2 << 19)

heap_increment: Is the number of extra heaps to be allocated, next time Ruby grows the number of heaps (as it does after it runs a GC and discovers it does not have enough free space), this number is updated each GC run to be 1.8 * heap_used. In later versions of Ruby this multiplier is configurable.

heap_live_num: This is the running number objects in Ruby heaps, it will change every time you call GC.stat

heap_free_num: This is a slightly confusing number, it changes after a GC runs, it will let you know how many objects were left in the heaps after the GC finished running. So, in this example we had 102447 slots empty after the last GC. (it also increased when objects are recycled internally - which can happen between GCs)

heap_final_num: Is the count of objects that were not finalized during the last GC

total_allocated_object: The running total of allocated objects from the beginning of the process. This number will change every time you allocate objects. Note: in a corner case this value may overflow.

total_freed_object: The number of objects that were freed by the GC from the beginning of the process.

When will the GC run

The GC in Ruby 2.0 comes in 2 different flavors. We have a “full” GC that runs after we allocate more than our malloc_limit and a lazy sweep (partial GC) that will run if we ever run out of free slots in our heaps.

The lazy sweep takes less time than a full GC, however only performs a partial GC. It’s goal is to perform a short GC more frequently thus increasing overall throughput. The world stops, but for less time.

The malloc_limit is set to 8MB out of the box, you can raise it by setting the RUBY_GC_MALLOC_LIMIT higher.

Why a malloc limit?

Discourse at boot only takes up 25MB of heap space, however when we look at the RSS for the process we can see it is consuming way over 134MB. Where is all this extra memory?

sam@ubuntu:~/Source/discourse$ RAILS_ENV=production rails c
irb(main):008:0> `ps -o rss= -p #{Process.pid}`.to_i
=> 134036
irb(main):009:0> (GC.stat[:heap_length] * (2 << 13)).to_f / (2 << 19)
=> 26.15625

The Ruby heaps store RVALUE objects, these objects at most can store 40 bytes. For Strings, Arrays and Hashes this means that small objects can fit in the heap, but as soon as they reach a threshold, Ruby will malloc extra memory outside of the Ruby heaps. We can see an example here:

sam@ubuntu:~/Source/discourse$ irb
irb(main):001:0> require 'objspace'
=> true
irb(main):002:0> ObjectSpace.memsize_of("a")
=> 0
irb(main):005:0> ObjectSpace.memsize_of("a"*23)
=> 0
irb(main):006:0> ObjectSpace.memsize_of("a"*24)
=> 24
# peace comes at a high cost
irb(main):017:0> ObjectSpace.memsize_of("☮"*8)
=> 24

Note: in Ruby 2.0 memsize_of does not include the RVALUE size, hence 0, we are hoping to change this in Ruby 2.1 see: Bug #8984: ObjectSpace.memsize_of(obj) should return with sizeof(RVALUE) - Ruby master - Ruby Issue Tracking System

Turns out that for Rails apps the vast majority of the RSS consumption is not by Ruby heaps but by attached information to objects allocated outside of the Ruby heap and general memory fragmentation.

$ RAILS_ENV=production rails c
irb(main):005:0> size=0; ObjectSpace.each_object{|o| size += ObjectSpace.memsize_of(o) }; puts size/1024
67265

This fact puts a bit of a damper on the GC Bitmap Marking algorithm introduced in Ruby 2.0. For a large Rails app, at best, it is optimising reuse of 20% or so, further more this 20% can get fragmented which makes stuff worse.

We can explore the default malloc limit (it is 8MB out of the box). If we allocate 8 objects that are 1MB each we can trigger a GC:

$ irb
irb(main):001:0> GC.start
=> nil
irb(main):002:0> GC.count
=> 22
irb(main):003:0> 8.times { Array.new(1_000_000/8) } ; puts
=> nil
irb(main):004:0> GC.count
=> 23
irb(main):005:0> require 'objspace'
=> true
irb(main):006:0> ObjectSpace.memsize_of(Array.new(1_000_000/8))
=> 1000000
irb(main):007:0>

Ruby protects your processes from using up all the available memory on your computer when making throw away copies of large objects.

However, this setting very outdated, it was introduced many years ago by matz when memory was scarce.

For an added bonus using very nasty hacks we can even raise this number in runtime.

sam@ubuntu:~/Source/discourse$ irb
irb(main):001:0> 15.times { Array.new(16_000_000/8) }; puts
=> nil
irb(main):002:0> GC.start; GC.count
=> 38
irb(main):003:0> 15.times { Array.new(1_000_000/8) }; puts 
=> nil
irb(main):004:0> GC.count
=> 38

MRI will raise the GC limit if it over-exhausted (a percentage each time). However, in the real world, in a real Rails app the GC limit is very unlikely to grow much during runtime, you just don’t allocate huge objects regularly. So, we usually use the environment var RUBY_GC_MALLOC_LIMIT to push this number up.

Every rails app should have a higher malloc limit. The default is too small, this tiny default means that many Rails apps in the wild are getting zero benefit from the faster “lazy sweep” algorithm implemented in Ruby 1.9.3. Further more, low malloc limits mean that the GC runs way too often. Typical Rails requests will regularly allocate a couple of megs of RAM.

What should you set it to? It totally depends on the app. For Discourse we recommend 50MB. The downside of setting this too high is that you are increasing general memory fragmentation.

How much memory is a page view allocating?

rack-mini-profiler (in master) contains a very handy report to get a handle of memory use in your various pages. Just apppend ?pp=profile-gc at the end of your url:

Overview
------------------------------------
Initial state: object count - 377099 , memory allocated outside heap (bytes) 76765247

GC Stats: count : 114, heap_used : 4283, heap_length : 4312, heap_increment : 0, heap_live_num : 459148, heap_free_num : 1283203, heap_final_num : 0, total_allocated_object : 6292870, total_freed_object : 5833722

New bytes allocated outside of Ruby heaps: 1458308
New objects: 38363

ObjectSpace delta caused by request:
--------------------------------------------
String : 18638
Array : 10053
Hash : 3229
ActiveRecord::AttributeMethods::TimeZoneConversion::Type : 1297
Rational : 790
Time : 615
MatchData : 364
RubyVM::Env : 330

Here we can see that the front page is causing 1.45MB to allocate, so out-of-the-box, without any malloc tuning we can only handle 5 requests. 5 requests only generate 190k or so objects in the heap that is way below heap free num.

We spent a lot of time tuning Rails 4 to cut down on allocations, before we started tuning this we were easily allocating double the amount for a front page request.

note: running this report unavoidably is likely to cause your Ruby heaps to grow, due to iteration through ObjectSpace with GC disabled. It is recommended you cycle your processes in production after an analysis session.

The trouble with the heap growth algorithm

Ruby heaps will grow by a factor of 1.8 (times used heap size post GC) every time heap space is hits a threshold. This is rather problematic for real world apps. The number of heaps available may increase during an apps lifecycle, but it will never decrease. Say you have 1000 heaps in play, next time heaps grow you will jump to 1,800 heaps. However, your app may have optimal performance with 1,400 heaps. Remember, the more used heaps you have the slower it will take a GC to run.

note: the Ruby heap growth factor is configurable and adaptable in Ruby 2.1.

We have some control over the heap count using the RUBY_HEAP_MIN_SLOTS, we can tell ruby to pre-allocate heap space, unfortunately in Ruby 2.0 p247 this is a bit buggy and will result in over allocation, for example here we ask for 1000 heap slots but get 1803 in Ruby 2.0:

sam@ubuntu:~/Source$ rbenv shell ruby-head
sam@ubuntu:~/Source$ RUBY_HEAP_MIN_SLOTS=$(( 408*1000  )) ruby -e "puts GC.stat[:heap_length]"
1000
 
sam@ubuntu:~/Source$ rbenv shell 2.0.0-p247
sam@ubuntu:~/Source$ RUBY_HEAP_MIN_SLOTS=$(( 408*1000  )) ruby -e "puts GC.stat[:heap_length]"
1803

So, you can use this setting but be careful with it, it will over commit heap space, meaning, slower GC times. See also: Bug #9134: RUBY_HEAP_MIN_SLOTS does not work correctly in Ruby 2.0 - Ruby master - Ruby Issue Tracking System

We can also attempt to control heap space with RUBY_FREE_MIN. Unfortunately this setting does not work as expected.

sam@ubuntu:~/Source$ RUBY_FREE_MIN=$(( 408*10000  )) ruby -e " GC.start; p GC.stat[:heap_length]"
81
sam@ubuntu:~/Source$ RUBY_FREE_MIN=$(( 408*20000  )) ruby -e " GC.start; p GC.stat[:heap_length]"
81

All this setting does is forces Ruby to evaluate if it needs to grow a heap, more aggressively.

Out of the box this is how the algorithm works, more or less:

GC sweep runs
Ruby checks if the free_num (the number of free objects in the used heaps) is smaller than free_min aka (RUBY_FREE_MIN)
Ruby runs set_heaps_increment and heaps_increment
set_heaps_increment checks to see if heaps_used * 1.8 is larger than heaps_length … if it is it will grow the heap by 0.8 * heaps_used.

The key here is that all free_num does is trigger a check. Out of the box free_min is dynamically adjusted to 20% of heaps_used. I can not think of any reason you would really play with this setting.

The implementation is much more more intuitive in Ruby 2.1 see: Bug #9137: RUBY_FREE_MIN is not enforced correctly - Ruby master - Ruby Issue Tracking System

###The holy grail of an out-of-band GC

A full GC can take a long time, in fact on a droplet at Digital Ocean, this blog can spend upwards of 100ms to perform a GC.

This GC stops the world and “stalls” your customers. In an ideal world you would be able to control the GC and run it between requests. As long as you have enough worker processes, this stall will be invisible to your customers.

The problem though is that it is very hard to predict when a GC will run, cause malloc information is totally invisible in Ruby 2.0. We are hoping to expose more information in Ruby 2.1.

This means that if RUBY_GC_MALLOC_LIMIT is set too low, you have no way of predicting when a GC will run.

There have been two attempts at an out-of-band-gc made public.

Unicorn OOBGC module Unicorn::OobGC
Passenger OOBGC https://github.com/phusion/passenger/blob/master/lib/phusion_passenger/rack/out_of_band_gc.rb

Both attempts are severely flawed. In modern web apps the amount of data a page can allocate varies wildly. Some pages may allocate a tiny amount of memory and objects others lots.

You can not deterministically guess when its best to run the GC based on request count alone. This means these attempt often run the GC way too often.

Worst still they often attempt to run GC.disable which has extreme possibility of creating rogue Ruby processes with massive heaps. Once you disable the GC all bets are off. A simple loop can create an very problematic process.

irb(main):008:0> GC.disable
=> false
irb(main):009:0> 100_000_000.times{ "" } ; p
=> nil
irb(main):010:0> GC.enable
=> true
irb(main):011:0> GC.stat
=> {:count=>4472, :heap_used=>246240, :heap_length=>286126, :heap_increment=>39886, :heap_live_num=>100082676, :heap_free_num=>42424, :heap_final_num=>0, :total_allocated_object=>289369670, :total_freed_object=>189286994}
iirb(main):014:0> t=Time.now; GC.start; puts (Time.now - t)
0.15620451

There, we now have a process that takes 156ms to run the GC on bleeding edge hardware.

And let’s not forget the obscene memory usage

sam@ubuntu:~/Source/discourse$ smem
  PID User     Command                         Swap      USS      PSS      RSS 
 8906 sam      irb                                0  3982736  3983692  3985700

Even with all the missing information, we can do better than a simple, flawed, request count. At Discourse I have been working on an out-of-band-GC that works quite successfully in production. Firstly we need to make sure malloc limit rarely affects us. We do so by raising it to 40MB.

Source is here: https://github.com/discourse/discourse/blob/master/lib/middleware/unicorn_oobgc.rb

It attempts to keep a running estimate of the live object count that will trigger a GC using:

 # the closer this is to the GC run the more accurate it is
  def estimate_live_num_at_gc(stat)
    stat[:heap_live_num] + stat[:heap_free_num]
  end

This is extremely conservative and not that accurate. We can also experiment with:

# base on heap length
  def estimate_live_num_at_gc(stat)
    stat[:heap_length] * 408 # objects per slot 
  end

The algorithm than tries to leave room for 2 “big” requests, if it notices there is not enough room, it will preempt a GC.

This worked very successfully for us at http://discourse.ubuntu.com as can be seen when running this in verbose mode.

OobGC hit pid: 28701 req: 56 max delta: 111782 expect at: 893328 67ms saved
OobGC hit pid: 28680 req: 57 max delta: 50000 expect at: 893328 64ms saved
OobGC hit pid: 28728 req: 45 max delta: 112105 expect at: 893328 61ms saved
OobGC hit pid: 28687 req: 49 max delta: 50000 expect at: 949063 74ms saved
OobGC hit pid: 28707 req: 66 max delta: 50000 expect at: 893328 71ms saved
OobGC hit pid: 28695 req: 89 max delta: 50000 expect at: 893328 67ms saved
OobGC hit pid: 28728 req: 20 max delta: 71807 expect at: 893328 61ms saved
OobGC hit pid: 28680 req: 43 max delta: 62992 expect at: 893328 68ms saved
OobGC hit pid: 28701 req: 75 max delta: 50000 expect at: 893328 73ms saved
OobGC hit pid: 28707 req: 52 max delta: 50000 expect at: 893328 68ms saved
OobGC hit pid: 28695 req: 34 max delta: 81301 expect at: 893328 61ms saved
OobGC hit pid: 28687 req: 68 max delta: 50000 expect at: 949063 74ms saved
OobGC hit pid: 28728 req: 69 max delta: 50000 expect at: 893358 69ms saved
OobGC hit pid: 28701 req: 39 max delta: 73273 expect at: 893328 61ms saved
OobGC hit pid: 28695 req: 47 max delta: 115067 expect at: 893328 65ms saved
OobGC hit pid: 28707 req: 48 max delta: 185909 expect at: 893328 68ms saved
OobGC hit pid: 28680 req: 85 max delta: 50000 expect at: 893328 68ms saved
OobGC hit pid: 28695 req: 20 max delta: 52118 expect at: 893328 62ms saved
OobGC hit pid: 28687 req: 63 max delta: 50000 expect at: 949063 73ms saved
OobGC hit pid: 28728 req: 42 max delta: 64944 expect at: 893328 63ms saved
OobGC hit pid: 28680 req: 41 max delta: 138184 expect at: 893328 65ms saved
OobGC hit pid: 28701 req: 50 max delta: 50000 expect at: 893328 70ms saved
OobGC miss pid: 28707 reqs: 50 max delta: 50000

Once in a while you get a miss, cause it is impossible to predict malloc and potentially massive requests, however, in general it helps a lot. You can see the out-of-band-gc kicking in at different request counts, sometimes we can handle 20 requests between GCs, other times 80. As an added bonus, you don’t need to run unicorn_killers and risk is very low.

Keep exploring

Given the built in tooling and Mini Profiler, you are not running blind, you can do quite a lot to investigate and understand your GC behavior.

Try running these snippets and tools, try exploring.

Many very exciting changes both to GC algorithms and tooling are forthcoming in Ruby 2.1 thanks to work by Koichi Sasada, Aman Gupta and others. I hope to blog about it.

Special thank you to Koichi for reviewing this article.

Posted by: Sam Permalink | Comments (19)

Comments

Pavel Forkert almost 11 years ago

The key here is that all free_num does is trigger a check. Out of the box free_min is dynamically adjusted to 20% of heaps_used. I can not think of any reason you would really play with this setting.

Actually you can use it to force grow the heap which speeds up booting large applications (because of lower GC count), however the speed up is not so big.

Sam Saffron almost 11 years ago

I think you would use RUBY_HEAP_MIN_SLOTS to force grow the heap (even though it is buggy), it is actually quite an important setting.

Consider cases where you want to ensure there is enough heap space to serve N requests, since requests usually allocate a small limit of objects (compared to the current heap) odds are they will not trigger heap growth.

End result is that, depending on your app, you may have room for a very limited amount of requests in the heap.

RUBY_FREE_MIN as implemented in 2.0 is not that useful, however, we now have it fixed in 2.1 so it can be used very effectively to ensure heap space for requests.

Since we do not have it in 2.0 our only real option is using RUBY_HEAP_MIN_SLOTS to ensure heap space (or ugly allocation hacks)

*note: both these settings are being renamed in Ruby 2.1

I can confirm it does assist in bootup, feels quite random though:

sam@ubuntu:~/Source/discourse$ rails r 'p GC.stat'
{:count=>110, :heap_used=>1163, :heap_length=>1996, :heap_increment=>833, :heap_live_num=>472843, :heap_free_num=>71822, :heap_final_num=>0, :total_allocated_object=>3822542, :total_freed_object=>3349699}
sam@ubuntu:~/Source/discourse$ RUBY_FREE_MIN=100000000000 rails r 'p GC.stat'
{:count=>78, :heap_used=>1262, :heap_length=>2030, :heap_increment=>768, :heap_live_num=>513142, :heap_free_num=>86089, :heap_final_num=>0, :total_allocated_object=>3822607, :total_freed_object=>3309465}

GC during boot reduced from 110 times to 78

We can achieve the same with this, and it makes a bit more sense:

sam@ubuntu:~/Source/discourse$ RUBY_HEAP_MIN_SLOTS=$(( 408*2000  )) rails r 'p GC.stat'
{:count=>79, :heap_used=>1279, :heap_length=>2300, :heap_increment=>1021, :heap_live_num=>512983, :heap_free_num=>147494, :heap_final_num=>0, :total_allocated_object=>3822528, :total_freed_object=>3309545}

Pavel Forkert almost 11 years ago

Heh, actually you are right, RUBY_HEAP_MIN_SLOTS should be more appropriate and it is supported even in 1.9.3. Thanks.

Pat Shaughnessy almost 11 years ago

Thanks Sam for this great, hands-on exploration of how MRI GC works.I think it’s essential for Ruby developers to have some understanding of how things work “under the hood,” and this post really shows how important - and complex - GC is. (And of course thanks for mentioning and linking to Ruby Under a Microscope!)

Sam Saffron almost 11 years ago

Thanks @pat_shaughnessy really appreciate your work explaining Ruby internals, was seriously impressed with how fast your book shipped to Australia

Ruby 2.1 has so many GC changes you are going to need a massive appendix to cover it

George Armhold almost 11 years ago

I believe you have an extra ‘r’ in there. I needed to use ?pp=profile-gc to get the profile info.

Not trying to pick nits, but rather clarify for others who may copy/paste as I did, and then wonder why it doesn’t work. Thanks for doing this extremely useful writeup!

Sam Saffron almost 11 years ago

Thanks heaps for catching it, did you see anything interesting in the output? Curious to see what kind of numbers are out there, how many allocs do you get per request?

Kyrylo Silin almost 11 years ago

Why do we need to perform require 'objspace' (your first line of the first snippet)? Everything seems to be working without it.

George Armhold almost 11 years ago

did you see anything interesting in the output

I’m still very much in the “figuring out what it all means phase”. I initially came here trying to see if there was some GC setting I should be applying to my prod setup (Passenger and ruby-2.0.0-p353).

Unfortunately as with most things, there does not seem to be a “GO_FASTER=1” flag for GC, and much experimentation is needed to tune a particular environment. No surprises there.

However rack-mini-profiler quickly helped me realize that I had several N+1 Selects and a totally broken use of Rails.cache.fetch (I had a return statement from within a Proc which short-circuited the cache write!) So two big wins right there already.

how many allocs do you get per request?

Here’s a typical snippet:

------------------------------------
Initial state: object count - 208878 , memory allocated outside heap (bytes) 54724352

GC Stats: count : 62, heap_used : 1229, heap_length : 1229, heap_increment : 0, heap_live_num : 256854, heap_free_num : 243099, heap_final_num : 0, total_allocated_object : 3267264, total_freed_object : 3010410

New bytes allocated outside of Ruby heaps: 3833179
New objects: 50533

ObjectSpace delta caused by request:
--------------------------------------------
String : 24302
Array : 16453
Hash : 3973
MatchData : 895
ActiveSupport::SafeBuffer : 723
Nokogiri::XML::NodeSet : 643
Nokogiri::XML::Text : 441
RubyVM::Env : 434
Proc : 422
Nokogiri::XML::Element : 385
Regexp : 355
Nokogiri::XML::ParseOptions : 158
Nokogiri::HTML::DocumentFragment : 145
[lots more trimmed...]

Is that a lot? shrug… BTW that’s with Rails 3.2, RAILS_ENV=production, and running on my Mac, so perhaps the numbers are inflated due to 64 bit.

PS: might want to mention that folks will need the git master version (rather than the released one from RubyGems) if they want output that matches your examples (the “Initial state” header seems to be new?) Syntax for that is:

gem 'rack-mini-profiler', git: 'https://github.com/MiniProfiler/rack-mini-profiler.git'

Not sure it’s a good idea to deploy that to prod if master is your dev branch though…

Thanks heaps for catching it

Pun intended?

Sam Saffron almost 11 years ago

Good point, I guess its a habit thing. I always forget to require it and then I get surprised memsize_of is missing.

sam@ubuntu:~/Source/discourse$ pry
[1] pry(main)> ls ObjectSpace
constants: WeakMap
ObjectSpace.methods: 
  _id2ref  count_objects  define_finalizer  each_object  garbage_collect  undefine_finalizer
[2] pry(main)> require 'objspace'
=> true
[3] pry(main)> ls ObjectSpace
constants: InternalObjectWrapper  WeakMap
ObjectSpace.methods: 
  _id2ref        count_objects_size   each_object      memsize_of_all        
  count_nodes    count_tdata_objects  garbage_collect  reachable_objects_from
  count_objects  define_finalizer     memsize_of       undefine_finalizer 
```

Sam Saffron almost 11 years ago

@armhold when I look at your numbers the one that sticks out to me is: 3833179 bytes per request.

At this rate, with default GC settings you will be hitting the GC every 2 requests, that is a massive impact. I think typical Rails apps should be able to handle 10-60 reqs per GC so recommend you bump up your malloc limit. The numbers there are quite typical of Rails apps and I would expect it to go down by at least 20% if you upgraded to Rails 4.

With regards to rack-mini-profiler, we are running it live on http://meta.discourse.org I will bump up the version next week.

Really happy Mini Profiler is helping you, be sure to try ?pp=help and run through some of the other goodies like flamegraphs.

I will take a pun-t on that

George Armhold almost 11 years ago

Hmm, I tried running with a 2GB limit (RUBY_GC_MALLOC_LIMIT=2147483648 RAILS_ENV=production bundle exec rails s) but it seemed to have no impact- I still get 2-4 GCs per request according to GC Stats: count.

I only meant that I wasn’t sure if master was your “dev” branch or something more stable. Not meant as a knock on your code in any way. Glad to hear there’s a version bump on the way.

Sam Saffron almost 11 years ago

That strongly indicates you need more heaps. Try playing with RUBY_HEAP_MIN_SLOTS=$(( 408*1500 )) as imperfect as the setting is it still kind of works.

David Butler almost 11 years ago

Hi, I’m wondering if you have experimented with running Discourse under JRuby to take advantage of the JVM’s more mature GC? If so, what were your experiences? Thanks!

Brandon Mathis almost 11 years ago

Hey Sam, why the magic number of 23 for the max size of a heap string? Can you break down the structure of a ruby heap so I can better understand why this prime number 23 is the cut off for a string length?

Sam Saffron almost 11 years ago

@davogones I tried running Discourse on JRuby in the past and it was actually slower and required a fair amount of hacking, postgres support is raw and Charlie’s pg port is a work-in-progress as far as I can tell.

I would love to revisit this and would be happy to take in patches that give us JRuby support, I would even consider using it in production if it was faster.

Sam Saffron almost 11 years ago

@bemathis

This happens cause RString still needs room for RBasic:

struct RBasic {
    VALUE flags;
    const VALUE klass;
}

#define RSTRING_EMBED_LEN_MAX ((int)((sizeof(VALUE)*3)/sizeof(char)-1))
struct RString {
    struct RBasic basic;
    union {
	struct {
	    long len;
	    char *ptr;
	    union {
		long capa;
		VALUE shared;
	    } aux;
	} heap;
	char ary[RSTRING_EMBED_LEN_MAX + 1];
    } as;
};

On an x64 machine RBasic takes up 8+8 bytes. (two longs) which leaves us with (8+8+8) minus 1 for null termination I guess.

David Butler almost 11 years ago

After an evening of hacking away at the Gemfile and commenting out C extensions, I got Discourse running in JRuby! It seems pretty snappy too. All that remains is to plug in the JRuby equivalents for the gems that rely on C extensions. Are there any benchmarks I can run?

Sam Saffron almost 11 years ago

Totally, there is the Discourse bench see:

http://meta.discourse.org/t/benchmarking-discourse-locally/9070

Sam Saffron