How we instrument Rails at Discourse and how you can, too.

People following me have occasionally seen me post graphs like this:

Usually people leave this type of instrumentation and graphing to NewRelic and Skylight. However, at our scale we find it extremely beneficial to have instrumentation, graphing and monitoring local cause we are in the business of hosting, this is a central part of our job.

Over the past few years Prometheus has emerged as one of the leading options for gathering metrics and alerting. However, sadly, people using Rails have had a very hard time extracting metrics.

Issue #9 on the official prometheus client for Ruby has been open 3 years now, and there is very little chance it will be “solved” any time soon.

The underlying fundamental issue is that Prometheus, unlike Graphite/Statsd is centered around the concept pulling metrics as opposed to pushing metrics.

This means you must provide a single HTTP endpoint that collects all the metrics you want exposed. This ends up being particularly complicated with Unicorn/Puma and Passenger who usually will run multiple forks of a process. If you simply implement a secured /metrics endpoint in your app, you have no guarantees over which forked process will handle the request, without “cross fork” aggregation you would just report metrics for a single, random, process. Which is less than useful.

Additionally, knowing what to collect and how to collect it is a bit of an art, it can easily take multiple week just to figure out what you want.

Having solved this big problem for Discourse I spent some time extracting the patterns.

Introducing prometheus_exporter

The prometheus_exporter gem is a toolkit that provides all the facilities you need.

  1. It has an extensible collector that allows you to run a single process to aggregate metrics for multiple processes on one machine.

  2. It implements gauge, counter and summary metrics.

  3. It has default instrumentation that you can easily add to your app

  4. It has a very efficient and robust transport channel between forked processes and master collector. The master collector gathers metrics via HTTP but reduces overhead by using chunked encoding so a single session can gather a very large amount of metrics.

  5. It exposes metrics to prometheus over a dedicated port, HTTP endpoint is compressed.

  6. It is completely extensible, you can pick as much or as little as you want.

A minimal example implementing metrics for your Rails app

In your Gemfile:

gem 'prometheus_exporter'
# in config/initializers/prometheus.rb
if Rails.env != "test"
  require 'prometheus_exporter/middleware'

  # This reports stats per request like HTTP status and timings
  Rails.application.middleware.unshift PrometheusExporter::Middleware

At this point, your web is instrumented, every request will keep track of SQL/Redis/Total time (provided you are using PG)

You may also be interested in per-process stats, like:


# in config/initializers/prometheus.rb
if Rails.env != "test"
  require 'prometheus_exporter/instrumentation'

  # this reports basic process stats like RSS and GC info, type master
  # means it is instrumenting the master process
  PrometheusExporter::Instrumentation::Process.start(type: "master")
# in unicorn/puma/passenger be sure to run a new process instrumenter after fork
after_fork do
  require 'prometheus_exporter/instrumentation'

Also you may be interested in some Sidekiq stats:

Sidekiq.configure_server do |config|
   config.server_middleware do |chain|
      require 'prometheus_exporter/instrumentation'
      chain.add PrometheusExporter::Instrumentation::Sidekiq

FInally, you may want to collect some global stats across all processes, like:

To do so we can introduce a “type collector”:

# lib/global_type_collector.rb
unless defined? Rails
  require File.expand_path("../../config/environment", __FILE__)

require 'raindrops'

class GlobalPrometheusCollector < PrometheusExporter::Server::TypeCollector
  include PrometheusExporter::Metric

  def initialize
    @web_queued ="web_queued", "Number of queued web requests")
    @web_active ="web_active", "Number of active web requests")

  def type

  def observe(obj)
    # do nothing, we would only use this if metrics are transported from apps

  def metrics
    path = "/var/www/my_app/tmp/sockets/unicorn.sock"
    info = Raindrops::Linux.unix_listener_stats([path])[path]


After all of this is done you need to run the collector (in a monitored process in production) using runit ,supervisord, systemd or whatever your poison is (mine is runit).

bundle exec prometheus_exporter -t /var/www/my_app/lib/global_app_collector.rb

Then you follow the various guides online and setup Prometheus and the excellent Grafana and you too can have wonderful graphs.

For those curious, here is an partial example of how the raw metric feed looks for an internal app we use that I instrumented yesterday:

I hope you find this helpful, good luck instrumenting all things!


Lukáš Zapletal 4 months ago
Lukáš Zapletal

We are going the statsd_exporter way. Have you considered it? It lacks free-form tags, but it has a mapping that will do the job, I wrote a mapping config generator so it’s all automatic.

Sam Saffron 4 months ago
Sam Saffron

There is this that requires some expanding:

That said, I think using statsd_exporter is a completely reasonable way of solving this problem.

Prometheus Exporter has some advantages:

  • Ability to transport multiple metrics in a single payload (such as all of GC.stats) in one go.

  • Ability to define “global” collectors directly in-process, which means stuff like Raindrops::Linux.tcp_listener_stats(['']) can be collected directly from the exporter process as opposed to forwarded to it

  • Simpler to configure cause you don’t need to generate a yml file to teach it how to parse metrics

statsd_exporter has some advantages

  • Easier to fit in if you already have statsd in place

  • Built in golang so it probably consumes slightly less memory

Sam Saffron 4 months ago
Sam Saffron

@lzap be sure to follow that issue on GitHub, cause basically what they are saying is that what you are doing is long-term wrong ™ which I disagree with. But there is only so much battling I can do there :blush:

Lukáš Zapletal 4 months ago
Lukáš Zapletal

Yeah thanks, I am following that already, appreciated.

comments powered by Discourse