9 months ago
I would love a long running Ruby and Rails set of benchmarks. I talked about this at GoGaRuCo and would like to follow up.
For a very long time Python has had the pypy speed center:
Recently, golang has added its own: http://goperfd.appspot.com/perf
Why is this so important?
Writing fast software requires data. We need to know right away when our framework or platform is getting slower or faster. This information can be fed directly to the team informing them of big wins and losses. Often small changes can lead to unexpected gains or losses.
Finding out about regressions months in to the development cycle can often incur a massive cost, fixing bugs early on is cheap. Usually the longer we wait the more expensive it is to fix.
Imagine if for every Rails major release the team could announce not only that it is N% faster in certain areas but also attribute the improvements to particular commits.
Imagine if we could have a wide picture about the performance gains and losses of a new Ruby versions, given full context to the reason why something slowed down or sped up.
What we have today?
We have a fair amount
- The Discourse benchmarks see: http://meta.discourse.org/t/benchmarking-discourse-locally/9070
- The benchmarks bundled with Ruby see: https://github.com/ruby/ruby/tree/trunk/benchmark
- Other benchmarks see: http://miguelcamba.com/blog/2013/10/05/benchmarking-the-ruby-2-dot-1-and-rubinius-2-dot-0/
- A server provisioned by Ninefold to run the long running benchmarks and host a site
- A lot of Ruby implementations to test (MRI, JRuby, Rubinius)
- A place to start contributing code: https://github.com/SamSaffron/ruby-bench
- Discourse is very close to working on Rails master now, and works fine on Ruby head.
The Discourse benchmark can be used to benchmark a "real world" app, it integrates the entire system. The other small microbenchmarks can be used to bench specific features and areas.
The importance of stable dedicated hardware
The server provisioned by Ninefold is a bare metal server, once we ensure power management is disabled it can be used to provide consistent results, unlike virtual hosts which are often dependent on external factors. We need to produce results and reproduce them on multiple sets of hardware to ensure we did not mess up our harness.
Tracking all metrics
There are a sea of metrics we can gather. GC times, Rails bootup times, Memory usage, Page load times, RSS, Requests per second and so on. We don't need to be shoe horned into tiny micro benches. When tracking performance we need to focus on both the narrow and wide.
Often performance is a game of trade-offs, you make some areas slower so some other, more important areas, become faster.
Raw execution speed, memory usage and disk usage all matter. Our performance can depend on memory allocators (like jemalloc or tcmalloc) and GC tuning, we can measure some specific best practice environments when gathering stats.
Graphing some imperfect data
I ran the Discourse bench and few other tests on 15 or so builds. Data is imperfect and needs to be reproduced, that said this is a starting point.
Here we can see a graph showing how the "long living" object count has reduced from build to build, somewhere between the end of November and December there was a huge decrease.
Here I am graphing the median time for a homepage request on Discourse over a bunch of builds
There are two interesting jumps, the first was a big slowdown when the RGenGC was introduced. Later in mid November we recovered the performance but it has regressed since.
Here it is clear to see the massive improvement the generational GC provided to the 75th percentile. Ruby 2.1 is going to be a massive improvement for those not doing any GC tuning.
Similarly Rails boot is much improved.
What we need?
A long term benchmarking project will take quite a while to build, personally I can not afford to dedicate more that a few hours per week.
Foremost, we need people. Developers to build a UI, integrate existing tools and add benchmarks. Designers to make a nice site.
Some contributions to related projects can heavily improve the "benchmarking" experience, faster builds and faster gem installs would make a tremendous difference.
The project can be split quite cleanly into 2 sub-projects. First is information gathering, writing the scripts needed to collect all the historical data into a database of sorts. The second part is a web UI to present the results potentially reusing or extending https://github.com/tobami/codespeed .
More hardware and a nice domain name would also be awesome.
If you want to work on this, help build UI or frameworks for running long term benchmarks, contact me (either here or at
Building a long term benchmark is critical for longevity and health of Ruby and Rails. I really hope we can get started on this soon. I wish I had more time to invest in this.
We started gathering info and people at http://community.miniprofiler.com/t/ruby-bench-intros/185 , feel free to post an intro there or create and