A few weeks ago we started noticing a fairly worrying trend at Stack Overflow. Question:Show, the most active of pages on Stack Overflow, started slowly trending up on render times. This did not happen at once, the performance leak was not huge.

We noticed this by looking at our HAProxy logs which are stored in a special Sql Server instance thanks to Jarrod. This database contains render times so we can quickly aggregate it, and catch worrying trends.

During the time we were browsing the site, occasionally, we would catch strange stalls before any of the code in our controllers fired.

gap

Getting big gaps like the one above was rare, however we consistently saw 5ms stalls before any code fired off on our side.

We thought of adding instrumentation to figure out more about these leaks, but could not figure out what needed instrumenting.

CPU Analyzer to the rescue

I blogged about CPU Analyzer in the past but failed to properly communicate its awesomeness.

It is a sampling profiler designed to work in production. In general, traditional profiling methods fall over when digging into production issues or high CPU issues. CPU Analyzer works around this.

It attaches itself to a running process using the ICorDebug interface and gathers stack traces in a predefined rate. Additionally, it gathers CPU time for each thread while doing so.

At the end of this process it sorts all of the information it has into a somewhat readable text representation, showing the most "expensive" stack traces at the top.

This allows you to get a crude view of the current state of affairs and find likely sources of various performance gaps.

I ran CPU Analyzer a few times on the production instance of w3wp gathering 40 snapshots at a time at every 110ms. Warning: if you gather snapshots too rapidly you risk crashing your process, ICorDebug was not designed for profiling, ICorProfiler is - the ability to attach though is recent, I may rewrite it one day.

cpu-analyzer-net4.exe 12345 /s 40 /i 110

I found some interesting snapshots:

System.Web.Routing.RouteCollection.GetRouteData
System.Web.Routing.UrlRoutingModule.PostResolveRequestCache
System.Web.HttpApplication+SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute
[...]
System.Web.Hosting.PipelineRuntime.ProcessRequestNotification
===> Cost (11232072)

And:

System.Text.RegularExpressions.Regex.IsMatch
System.Web.Routing.Route.ProcessConstraint
System.Web.Routing.Route.ProcessConstraints
System.Web.Routing.Route.GetRouteData
[...]
System.Web.Hosting.PipelineRuntime.ProcessRequestNotification
===> Cost (7332047)

And

System.IO.File.FillAttributeInfo
System.IO.File.InternalExists
System.IO.File.Exists
System.Web.Hosting.MapPathBasedVirtualPathProvider.CacheLookupOrInsert
System.Web.Routing.RouteCollection.GetRouteData
[...]
System.Web.Hosting.PipelineRuntime.ProcessRequestNotification
===> Cost (1560010)

From the three snapshots we can see the GetRouteData is busy running RegEx matches. In fact, these regular expressions are taking up 65% of the time in that method. Another 13% is consumed by File.Exist calls.

Now, these times are just samples, they may not represent reality especially for short operations. However, I gathered enough samples for a consistent picture to emerge.

I pinged Marcin and Phil, and got a very prompt and detailed reply from Marcin.

A crash course on ASP.NET routing

In an MVC3 scenario the UrlRoutingModule is responsible for looking at the path you type in your address bar, parsing it, and figuring out what information to pass on to MVC so it can dispatch the request.

Routes are denoted by fairly trivial Url patterns. For example the path: hello{{}}/{world} contains one static segment called hello{} and one wildcard placeholder named world; it can contain any text except for /.

The default process works somewhat like this:

  1. RouteCollection checks with its MapPathBasedVirtualPathProvider to see if the existing path is an actual location on disk. If it is, routing is bypassed and the next module picks up. This check can be disabled with RouteTable.Routes.RouteExistingFiles = true;

  2. MapPathBasedVirtualPathProvider has an internal cache (that sits on HttpRuntime.Cache which ensures it only performs a File.Exists or Directory.Exists call once a minute per unique string. This is configurable using the UrlMetadataSlidingExpiration node under hostingEnvironment in web.config.

  3. Every route is tested for a match, in order, in a simple loop. Top to bottom. This test is performed in the ParsedRoute class. For every route tested a dictionary will be constructed and the virtual path split up into a list. If you have 600 routes and only the bottom one matches, 600 dictionaries and lists will be created and some other processing will happen per-route.

  4. If a match is found, constraints are checked. There are 2 types of constraints, string constraints that are treated as a RegEx and tested for a match. And IRouteConstraint that are tested using the custom Match implementation. Both this types constraints live in a collection that is passed in using a RouteValueDictionary that is an IDictionary<string,object>. If any constraint fails to match, the route is skipped.

  5. The router checks the Regex constraints using a call to Regex.IsMatch(rule,RegexOptions.CultureInvariant | RegexOptions.Compiled | RegexOptions.IgnoreCase), more about why you should avoid that soon.

  6. Once the match is found it is passed on to MvcRouteHandler which creates an MvcHandler and is privy to the information the Route parsed out. Such as Controller and Action names.

We are very fussy about routes at Stack Overflow so we have tons

When you look at a question on Stack Overflow, the url: questions/1/my-awesome-question you do not see the url: Question/Show/1.

We are fussy. All our paths need to look, just-so. We pick nice friendly urls for all our various actions, which results in a pretty huge list of routes. We have upwards of 650 routes registered. The fact that we use attribute based routing makes creating lots of routes trivial.

Not only that, we are very fussy about the paths we route and make use of many RegEx constraints.

For example: for Question:Show we register the route: questions/{id}/{title?} and add the constraint ["id"] = "\d{1,9}". This ensures that questions/1 hits our question route and questions/bob does not. Without that constraint, the router would route such requests to MVC, which in turn would exception out. (since id is an int and is not optional)

The Regex cache and why it was impacting us

The Regex class has an internal cache for Regexes that are created using static method calls on the Regex class. You can specify its size using the CacheSize property:

By default, the cache holds 15 compiled regular expressions. Your
application typically will not need to modify the size of the cache.
Use the CacheSize property in the rare situation when you need to turn
off caching or you have an unusually large cache.

This documentation on MSDN is confusing and misleading, the Regex cache has nothing to do with compiled queries as this snippet demonstrates, a compiled regex will take up a spot and a non compiled one will as well:

Action<string,Action> timeIt = (msg,action) =>
{
    var sw = Stopwatch.StartNew();
    action();
    sw.Stop();
    Console.WriteLine(msg + " took: {0:0.####}ms", sw.Elapsed.TotalMilliseconds);
};

Regex.CacheSize = 1;
// warm up engine
Regex.Match("12345", @"^(\d{0:9}...)$", RegexOptions.Compiled);   


timeIt("compiled first run", () => 
{
    Regex.Match("12345", @"^(\d{0:9})$", RegexOptions.Compiled);    
});

// Outputs: 0.9321ms

timeIt("compiled second run", () =>
{
    Regex.Match("12345", @"^(\d{0:9})$", RegexOptions.Compiled);
});

// Outputs: 0.077ms

timeIt("compiled third run, constructor checks cache", () =>
{
    new Regex( @"^(\d{0:9})$", RegexOptions.Compiled).Match("12345");
});

// Outputs: 0.0822ms

timeIt("not compiled first run", () =>
{
    Regex.Match("12345", @"^(\d{0:9})$");
});

// Outputs: 0.0856

timeIt("compiled third run", () =>
{
    Regex.Match("12345", @"^(\d{0:9})$", RegexOptions.Compiled);
});

// Outputs: 0.914ms - clearly dropped off cache

Console.ReadKey();

Static method calls on the Regex class are allowed to insert stuff into this cache. All regular expression construction (which happens once per static method call) checks the cache. The router makes a call to a static method on the Regex class when processing constraints.

Constructing a compiled regex can be up to 3 orders of magnitude slower than using a plain old interpreted Regex. However the cache can easily mask this, since it allows for reuse.

There is a big caveat.

We make heavy use of Regex.IsMatch and Regex.Replace calls in our code base, for non performance critical stuff. Performance critical regular expressions live in compiled static fields on various classes.

If you make heavy use of Regex static method calls, like we do, you are risking compiled regular expressions dropping off the cache at any time. The cost of recompiling a regular expression can be huge.

I strongly recommend you never pass the RegexOptions.Compiled to any of the static helpers, it is very risky. You have no visibility into the cache and do not know how various tool vendors will use it.

With all of this in mind, here is a list of optimizations we took. It eliminated the vast majority of the routing cost.


Firstly, we stopped using string based route constraints

The biggest optimization responsible for the majority of the performance gains was eliminating the use of string regular expression constraints.

var constraints = new Dictionary<string, IRouteConstraint>();

// instead of
// consraints.Add("Id", "\d{1,9}"); 

// we now do: 
constraints.Add("Id", new RegexConstraint("^(\d{1,9})$")

With trivially defined IRouteContraint:

public class RegexConstraint : IRouteConstraint, IEquatable<RegexConstraint>
{
   Regex regex;

   public RegexConstraint(string pattern, RegexOptions options = RegexOptions.CultureInvariant | RegexOptions.Compiled | RegexOptions.IgnoreCase)
   {
      regex = new Regex(pattern, options);
   }

   public bool Match(System.Web.HttpContextBase httpContext, Route route, string parameterName, RouteValueDictionary values, RouteDirection routeDirection)
   {
      object val;
      values.TryGetValue(parameterName, out val);
      string input = Convert.ToString(val, CultureInfo.InvariantCulture);
      return regex.IsMatch(input);
   }
}

This change alone eliminated all the "jittering". After this, finding routes took a consistent amount of time.

Next, we reordered our routes so the most used routes are on top

This is the second most important optimization. Routes that are only accessed 10 times a day should not be at the top of routes that are accessed millions of times a day. Every route check comes with a constant amount of overhead.

On startup, we query HAProxy and order our routing table so the most used routes are on the top. We have special rules that ensure the ordering is "safe" in cases where routes overlap, and ordering needs to be maintained.

Then, we turbo charged the GetRouteData call

Our routes, in general, follow a simple pattern. We have a static string followed by a dynamic catch all. So, for example we have: posts/{postid}/comments and q/{id}/{userid}. This pattern allows us to super charge GetRouteData.

class LeftMatchingRoute : Route
{
    private readonly string neededOnTheLeft;
    public LeftMatchingRoute(string url, IRouteHandler handler)
        : base(url, handler)
    {
        int idx = url.IndexOf('{');
        neededOnTheLeft = "~/" + (idx >= 0 ? url.Substring(0, idx) : url).TrimEnd('/');
    }
    public override RouteData GetRouteData(System.Web.HttpContextBase httpContext)
    {
        if (!httpContext.Request.AppRelativeCurrentExecutionFilePath.StartsWith(neededOnTheLeft, true, CultureInfo.InvariantCulture)) return null;
        return base.GetRouteData(httpContext);
    }
}

This means that we can bypass some of the internal parsing. For perspective, we are talking about a reduction from 5ms to match the last route in the table down to 0.4ms, for a routing table with 650 routes.

Lastly, we Disabled the file checks for static content

You can easily disable the file checks the router performs:

// in Global.asax.cs - Application_Start 
RouteTable.Routes.RouteExistingFiles = true;

Trouble is, that all static content now will have do a full router sweep and most likely result in a 404.

To work around this issue we perform the matching earlier.

In web.config we register our special router:

<httpModules>
  <remove name="UrlRoutingModule-4.0"/>
  <add name="UrlRoutingModule-4.0" type="StackOverflow.Helpers.ProfiledUrlRoutingModule"/>
</httpModules>

and

<system.webServer>
<modules runAllManagedModulesForAllRequests="true">
      <remove name="UrlRoutingModule-4.0"/>
      <add name="UrlRoutingModule-4.0" type="StackOverflow.Helpers.ProfiledUrlRoutingModule"/>
</modules> 
</system.webServer>

Then we define:

public class ProfiledUrlRoutingModule : UrlRoutingModule
{
    public override void PostResolveRequestCache(HttpContextBase context)
    {
        if (context.Request.Path.StartsWith("/content/", StringComparison.OrdinalIgnoreCase)) return;

        using (MiniProfiler.Current.Step("Resolve route"))
        {
            base.PostResolveRequestCache(context);
        }
    }
}

This means that any path starting with /content/ bypasses the module. It also allows us to hook up MiniProfiler so we can track how fast our routing is.

This also increases security. We no longer need to worry about any leftover files that are not in the content directory. For content to be served statically, it must live in the content directory. We serve robots.txt from a dynamic route.

Final Notes

These optimizations reduced routing overhead for our hottest path, showing questions, from 5ms with 30ms spikes to 0.1 ms. A huge win.

That said, it is our use of the framework that caused many of these issues to arise. In general the router is not a bottleneck. If you think it may be - start by profiling your routing using MiniProfiler. See what is slow, then implement any or none of these optimizations.

Big thank you to Marc Gravell for coming up with the ProfiledUrlRoutingModule the LeftMatchingRoute and the clever hack to reorder routes by real world usage. Big thank you to Marcin for being helpful and pointing us at the right direction.

Note I am assured by Marcin that the Regex constraint issue is fixed in System.Web in .NET 4.5

Comments

Stephen_Adams about 3 years ago
Stephen_Adams

Sam,

Love reading your performance posts. Whilst we don't currently do anything sophisticated in regards to routing, it's nice to know of some decent steps to go down if something seems amiss in routing land.

Thanks for this!

Sam Saffron about 3 years ago
Sam Saffron

Thank you!

Feedback greatly appreciated.

Jubs almost 3 years ago
Jubs

Sam, quick question. Why did you implement IEquatable on RegexConstraint? Merely good practice or is it used internally somehow?

Great article by the way, awesome to learn what you guys have in the performance toolbox for stackoverflow.

Brent_Arias over 2 years ago
Brent_Arias

In your article, you said:

// in Global.asax.cs - Application_Start 


RouteTable.Routes.RouteExistingFiles = false;

The above is unnecessary, as it is the default setting for MVC. Based on the surrounding comments you made, I think you merely have a typo. I think you meant:

// in Global.asax.cs - Application_Start 


RouteTable.Routes.RouteExistingFiles = true;

Thus your HttpModule intercepts necessary file-system checks, and otherwise MVC only attempts to match routes – no further file system check.
Of course, instead of a custom HttpModule, you could also have re-used your custom RegexConstraint inside an routes.IgnoreRoute() statement.

Sam Saffron over 2 years ago
Sam Saffron

good catch, it was a typo indeed

Keith over 2 years ago
Keith

I found the same issue to be the [cause of a massive 500ms slowdown][http://stackoverflow.com/questions/11900134/first-call-to-url-action-on-a-page-is-slow] when calling Url.Action and Html.BeginForm.

I had 25 routes, about half of which have regex based constraints, but while repeated regex constraints are slow I was still amazed that it added up to half a second.

Mitch 11 months ago
Mitch

Am I missing something... or instead of creating a custom http module, can't you just set runAllManagedModulesForAllRequests="false". At the IIS level in the applicationHost.config the UrlRoutingModule-4.0 module is added with preCondition="managedHandler", so if all you have to do to keep the module from running for static files (unmanaged/non-asp.net requests) is set runAllManagedModulesForAllRequests="false". If I'm understanding this correctly, it's not only less work and less overhead, but it will implicitly work for all static files, so you don't have to explicitly tell it which folders/files to ignore.


comments powered by Discourse