Instrumenting Rails with Prometheus

over 6 years ago

People following me have occasionally seen me post graphs like this:

Usually people leave this type of instrumentation and graphing to NewRelic and Skylight. However, at our scale we find it extremely beneficial to have instrumentation, graphing and monitoring local cause we are in the business of hosting, this is a central part of our job.

Over the past few years Prometheus has emerged as one of the leading options for gathering metrics and alerting. However, sadly, people using Rails have had a very hard time extracting metrics.

Issue #9 on the official prometheus client for Ruby has been open 3 years now, and there is very little chance it will be “solved” any time soon.

The underlying fundamental issue is that Prometheus, unlike Graphite/Statsd is centered around the concept pulling metrics as opposed to pushing metrics.

This means you must provide a single HTTP endpoint that collects all the metrics you want exposed. This ends up being particularly complicated with Unicorn/Puma and Passenger who usually will run multiple forks of a process. If you simply implement a secured /metrics endpoint in your app, you have no guarantees over which forked process will handle the request, without “cross fork” aggregation you would just report metrics for a single, random, process. Which is less than useful.

Additionally, knowing what to collect and how to collect it is a bit of an art, it can easily take multiple week just to figure out what you want.

Having solved this big problem for Discourse I spent some time extracting the patterns.

Introducing prometheus_exporter

The prometheus_exporter gem is a toolkit that provides all the facilities you need.

It has an extensible collector that allows you to run a single process to aggregate metrics for multiple processes on one machine.
It implements gauge, counter and summary metrics.
It has default instrumentation that you can easily add to your app
It has a very efficient and robust transport channel between forked processes and master collector. The master collector gathers metrics via HTTP but reduces overhead by using chunked encoding so a single session can gather a very large amount of metrics.
It exposes metrics to prometheus over a dedicated port, HTTP endpoint is compressed.
It is completely extensible, you can pick as much or as little as you want.

A minimal example implementing metrics for your Rails app

In your Gemfile:

gem 'prometheus_exporter'

# in config/initializers/prometheus.rb
if Rails.env != "test"
  require 'prometheus_exporter/middleware'

  # This reports stats per request like HTTP status and timings
  Rails.application.middleware.unshift PrometheusExporter::Middleware
end

At this point, your web is instrumented, every request will keep track of SQL/Redis/Total time (provided you are using PG)

You may also be interested in per-process stats, like:

and

# in config/initializers/prometheus.rb
if Rails.env != "test"
  require 'prometheus_exporter/instrumentation'

  # this reports basic process stats like RSS and GC info, type master
  # means it is instrumenting the master process
  PrometheusExporter::Instrumentation::Process.start(type: "master")
end

# in unicorn/puma/passenger be sure to run a new process instrumenter after fork
after_fork do
  require 'prometheus_exporter/instrumentation'
  PrometheusExporter::Instrumentation::Process.start(type:"web")
end

Also you may be interested in some Sidekiq stats:

Sidekiq.configure_server do |config|
   config.server_middleware do |chain|
      require 'prometheus_exporter/instrumentation'
      chain.add PrometheusExporter::Instrumentation::Sidekiq
   end
end

FInally, you may want to collect some global stats across all processes, like:

To do so we can introduce a “type collector”:

# lib/global_type_collector.rb
unless defined? Rails
  require File.expand_path("../../config/environment", __FILE__)
end

require 'raindrops'

class GlobalPrometheusCollector < PrometheusExporter::Server::TypeCollector
  include PrometheusExporter::Metric

  def initialize
    @web_queued = Gauge.new("web_queued", "Number of queued web requests")
    @web_active = Gauge.new("web_active", "Number of active web requests")
  end

  def type
    "app_global"
  end

  def observe(obj)
    # do nothing, we would only use this if metrics are transported from apps
  end

  def metrics
    path = "/var/www/my_app/tmp/sockets/unicorn.sock"
    info = Raindrops::Linux.unix_listener_stats([path])[path]
    @web_active.observe(info.active)
    @web_queued.observe(info.queued)

    [
      @web_queued,
      @web_active
    ]
  end
end

After all of this is done you need to run the collector (in a monitored process in production) using runit ,supervisord, systemd or whatever your poison is (mine is runit).

bundle exec prometheus_exporter -t /var/www/my_app/lib/global_app_collector.rb

Then you follow the various guides online and setup Prometheus and the excellent Grafana and you too can have wonderful graphs.

For those curious, here is an partial example of how the raw metric feed looks for an internal app we use that I instrumented yesterday: https://gist.github.com/SamSaffron/e2e0c404ff0bacf5fbca80163b54f0a4

I hope you find this helpful, good luck instrumenting all things!

EDIT: @bbonamin has shared a dashboard here which is a good starting point!

Posted by: Sam Permalink | Comments (11)

Comments

Lukáš Zapletal over 6 years ago

We are going the statsd_exporter way. Have you considered it? It lacks free-form tags, but it has a mapping that will do the job, I wrote a mapping config generator so it’s all automatic.

Sam Saffron over 6 years ago

There is this that requires some expanding:

github.com/prometheus/statsd_exporter

Clarify comment about "native solution" long term in Readme

opened 12:01PM - 07 Feb 18 UTC

closed 02:37PM - 07 Feb 18 UTC

SamSaffron

Readme says: > We recommend this only as an intermediate solution and recomme…nd switching to native Prometheus instrumentation in the long term. I would like this somewhat clarified. For forking web servers such as Passenger, Unicorn and Puma in the Ruby ecosystem, requests are round robined between process forks. This means you have no simple way of sharing state between forks and thus have a hard time collating data for an exporter. In this kind of setup a solution like statsd_exporter seems reasonable as the alternatives are all similarly messy. Be it sharing a pipe between all forks or nmaping a file and so on. I think the Readme should clarify that this is a completely legitimate solution for specific use cases such as aggregation of data from forked processes that all share an HTTP listener.

That said, I think using statsd_exporter is a completely reasonable way of solving this problem.

Prometheus Exporter has some advantages:

Ability to transport multiple metrics in a single payload (such as all of GC.stats) in one go.
Ability to define “global” collectors directly in-process, which means stuff like Raindrops::Linux.tcp_listener_stats(['127.0.0.1:3000']) can be collected directly from the exporter process as opposed to forwarded to it
Simpler to configure cause you don’t need to generate a yml file to teach it how to parse metrics

statsd_exporter has some advantages

Easier to fit in if you already have statsd in place
Built in golang so it probably consumes slightly less memory

Sam Saffron over 6 years ago

@lzap be sure to follow that issue on GitHub, cause basically what they are saying is that what you are doing is long-term wrong ™ which I disagree with. But there is only so much battling I can do there

Lukáš Zapletal over 6 years ago

Yeah thanks, I am following that already, appreciated.

hachi8833 almost 6 years ago

Published the JP translation of this article RailsのパフォーマンスをPrometheusで測定する（翻訳）｜TechRacho by BPS株式会社
Thank you very much for your kindness!

Sam Saffron almost 6 years ago

Thank you so much for translating!!

Bruno B about 5 years ago

Hey Sam! This is excellent. Thanks for sharing.

I was wondering, are there any Grafana dashboards pre-built with the stat names exposed by prometheus_exporter? I found a discourse one https://grafana.com/dashboards/3539 but it doesn’t match the metric names of this gem. Thanks again

Sam Saffron about 5 years ago

oh sorry about that, we had to rename all our metrics to match Prometheus standards so this dashboard is a bit out of date. Fixing the dashboard should be fairly straightforward, its just about renaming the metrics, maybe give it a shot? Happy to link to the fixed dashboard from the blog post!

Bruno B about 5 years ago

No worries! I renamed most of the metrics here by changing the discourse_ suffix to ruby_ (and a couple of other changes like adding _total here and there), and it started picking up some metrics.
https://grafana.com/dashboards/10238

Still, I found that the dashboard needs to be custom tailored for each app, specifically for those metrics that do show average on a controller action, etc; so further tweaks are necessary after installing in Grafana, should be a good start for most though! I’ll keep tweaking and share a more minimalistic version once I’m finished.

If you have any other suggestions they’re welcome , thank you, the gem has been a real time saver.

Sam Saffron about 5 years ago

Thanks heaps, I went ahead and linked your post from the blog post!

Pascal Zumkehr about 5 years ago

For those interested, we just published another Grafana Dashboard for Puma/Rails and Delayed Jobs based on prometheus_exporter:

https://grafana.com/dashboards/10306

Sam Saffron