Diagnosing runaway CPU in a .Net production application

over 3 years ago

So, you have this .Net app in production. Somewhere someone made some sort of mistake and it appears the CPU is pegged for large periods of time.

…and you ask yourself, how can I debug this:

  • There is no copy of Visual Studio installed.
  • There is a strict no-installer policy on these machines.
  • Performance is already messed up and you do not want to make stuff worse by diagnosing it.

To date the only answer I am aware of is magic voodoo art involving windbg and the sos extensions.

Sure you can run [process explorer]:(http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx) and isolate the evil thread:

image

But … you have no idea how that thread relates to your .Net application or where that evil thread is running.

Enter my cpu-analyzer tool.

Here is a quick demo:

static void MakeBadStuffHappen() {
    ThreadPool.QueueUserWorkItem(_ => { MisterEvil(); });
}

static void MisterEvil() {
    double d = double.MaxValue;
    while (true) {
        d = Math.Sqrt(d);
        if (d < 1.1) {
            d = double.MaxValue;
        }
    }
}

static void Main(string[] args) {
    MakeBadStuffHappen();
    Console.WriteLine("Hello world!");
    Console.ReadKey();
}

static void MakeBadStuffHappen() {
    ThreadPool.QueueUserWorkItem(_ => { MisterEvil(); });
}

static void MisterEvil() {
    double d = double.MaxValue;
    while (true) {
        d = Math.Sqrt(d);
        if (d < 1.1) {
            d = double.MaxValue;
        }
    }
}

static void Main(string[] args) {
    MakeBadStuffHappen();
    Console.WriteLine("Hello world!");
    Console.ReadKey();
}

We run:

cpu-analyzer.exe evilapp
------------------------------------
ThreadId: 4948
Kernel: 0 User: 89856576
EvilApp.Program.MisterEvil
EvilApp.Program.b__0
System.Threading.ExecutionContext.Run
System.Threading._ThreadPoolWaitCallback.PerformWaitCallbackInternal
System.Threading._ThreadPoolWaitCallback.PerformWaitCallback
... more lines omitted ... 

Aha, the method called MisterEvil is responsible for 9 seconds in user mode.

Of course this trivial sample is kind of boring, but once you apply this tool to bigger and more complex applications it can be a life saver.

…and did I mention, no installer is required.

You can download a demo that works in .Net 2.0 and have a play. Hope you find it helpful. Of course no warranties are provided, and its not my fault if it crashes your app.

Update Source on GitHub https://github.com/SamSaffron/cpu-analyzer

Comments

Eddy over 3 years ago

Very useful tool Sam – thanks! Just gave it a spin, and it works a treat.

If you’re making any updates, one small request: could you show the percentage of the sample time used by Kernel / User mode on each thread, e.g.

Kernel: 156250 (0.15%) User: 1718750 (1.7%)

(I find numbers with more than 6 digits or so hard to decipher at a glance.)

Sure will do something similar in the next update.
— Sam

Max about 2 years ago

How would I use this to diagnose a problem with a website running on IIS 7?

This would be HUGE for me as I have a site with a random problem I cannot figure out!

well we usually attach it to w3wp or whatever the asp.net worker process is — Sam

Jake about 2 years ago

This looks like a perfect simple utility. Thank you!

When I try to run this against our 64bit IIS7.5 worker process, it fails to attach to the process. This occurs even when I bring up my command prompt as Administrator.

Any ideas what could be wrong? I'd love to use this to debug some issues we’re seeing lately.

Eric Hauser almost 2 years ago

Any chance of open sourcing this? I’m not able to get it to work against w3wp.exe on my local machine (Windows 7).

I guess it makes sense, Ill see if I can push this to github in the next few weeks … its the exe here is .net 3.5 specific …I need to update it with the .net 4 version — Sam

Bryan Livingston almost 2 years ago

Hello Sam,

I disparately need this for a huge .net 4 site. How hard would it be to update for 4? Does it just need a recompile? Could you email me the source?

Thanks for the awesome tool. It’s exactly what I need.

Bryan

roger over 1 year ago

When I run it with target type “C++” and Executable: D:\dev\screen-capture-recorder-program\configuration_setup_utility\vendor\ffmpeg\bin\ffmpeg.exe , inline memory db,

it says “the system cannot find the file specified” is this expected?

Christian Bjerre over 1 year ago

Cool tool :) Works more or less as described in the options.

Get this crash on my application when it loads the .NET framework into the Win32 exe:

Unhandled Exception: System.InvalidOperationException: Reading old stack frames at Microsoft.Samples.Debugging.MdbgEngine.FrameCache.CheckUsability() at Microsoft.Samples.Debugging.MdbgEngine.FrameCache.GetFrame(Int32 index) at Microsoft.Samples.Debugging.MdbgEngine.FrameCache.d__1.MoveNext() at cpu_analyzer.ThreadSnapshot.GetThreadSnapshot(MDbgThread thread) at cpu_analyzer.Program.Main(String[] args)

Christian Bjerre over 1 year ago

When the framework is loaded, then cpu-analyser works as expected.

Christian Bjerre over 1 year ago

One small extension would be great – ability to stop the profiling in case you are done. Either by pressing CTRL+C or by dropping a file into file to communicate “I’m done”. Just a basic way to detach, stop and report earlier than when done by counts x interval.

Nate Pearson over 1 year ago

Sorry for the basic question…but how do I attach this to IIS? I get an exe and I’m not familiar with how to attach it :(.

you need to run a command prompt as admin figure out the pid for the w3wp you are interested in and attach to it — Sam

Mark over 1 year ago

I’m getting this error

“Unhandled Exception: System.Runtime.InteropServices.COMException: The state of t he thread is invalid. (Exception from HRESULT: 0x8013132D) at Microsoft.Samples.Debugging.CorDebug.NativeApi.ICorDebugThread3.CreateStac kWalk(ICorDebugStackWalk& ppStackWalk) at Microsoft.Samples.Debugging.CorDebug.CorThread.CreateStackWalk(CorStackWal kType type) at Microsoft.Samples.Debugging.MdbgEngine.MDbgV3FrameFactory.<EnumerateFrames

d__0.MoveNext() at Microsoft.Samples.Debugging.MdbgEngine.FrameCache.IterateNextFrame() at Microsoft.Samples.Debugging.MdbgEngine.FrameCache.GetFrame(Int32 index) at Microsoft.Samples.Debugging.MdbgEngine.MDbgThread.EnsureCurrentStackWalker () at Microsoft.Samples.Debugging.MdbgEngine.MDbgThread.get_Frames() at cpu_analyzer.ThreadSnapshot.GetThreadSnapshot(MDbgThread thread) at cpu_analyzer.Program.Main(String[] args) "

Not sure if there is a workarround. Can we skip thread on error for example?

this can happen if you take too many stack traces … back off … take one every 200 ms — Sam

Jignesh R about 1 year ago

Thanks a lot- your tool was quite helpful.

dan n about 1 year ago

Hey – just tried your latest .net 4.0 version w/ a .net 4.0 version app – got this exception… any idea why? This is a .net 4 app… wonder if its cos of .net 4 framework hot fixes and such. Any chance you could upload source code so we can re-compile on our own if necessary? thx for considering.

C:\Temp>cpu-analyzer-net4.exe MyApp Error: failed to attach to process: System.Runtime.InteropServices.COMException (0x80131C30): The operation failed because debuggee and debugger are on incompat ible platforms. (Exception from HRESULT: 0x80131C30) at Microsoft.Samples.Debugging.CorDebug.NativeApi.ICorDebug.DebugActiveProces s(UInt32 id, Int32 win32Attach, ICorDebugProcess& ppProcess) at Microsoft.Samples.Debugging.CorDebug.CorDebugger.DebugActiveProcess(Int32 processId, Boolean win32Attach, CorRemoteTarget target) at Microsoft.Samples.Debugging.MdbgEngine.MDbgProcess.Attach(Int32 processId, SafeWin32Handle attachContinuationEvent, CorRemoteTarget target) at Microsoft.Samples.Debugging.MdbgEngine.MDbgEngine.Attach(Int32 processId, SafeWin32Handle attachContinuationEvent, String version) at cpu_analyzer.Program.Main(String[] args)

np 12 months ago

trying to attach this to a w3wp.exe but getting a very generic “Error: failed to attach to process”. Any ideas? or is this just something it doesn’t do? Thanks

va 10 months ago

any chance for more explicit errors? I’m getting the generic “error: failed to attach to process”

running cpu-analyzer.exe 5555(PID) /s 60 /i 1500

Any help would be appreciated.

michael freidgeim 7 months ago

I’ve tried to attach to production w3wp, that taking 100% CPU,but each time it crashed the app with the following error.

Unhandled Exception: System.Runtime.InteropServices.COMException: An IL variable is not available at the current native IP. (Exception from HRESULT: 0x80131304)

   at Microsoft.Samples.Debugging.CorDebug.NativeApi.ICorDebugStackWalk.GetFrame (ICorDebugFrame& ppFrame)    at Microsoft.Samples.Debugging.CorDebug.CorStackWalkEx.MoveNextWorker()    at Microsoft.Samples.Debugging.CorDebug.CorStackWalkEx.MoveNext()    at Microsoft.Samples.Debugging.MdbgEngine.MDbgV3FrameFactory.<EnumerateFrames

d0.MoveNext()    at Microsoft.Samples.Debugging.MdbgEngine.FrameCache.GetFrame(Int32 index)    at Microsoft.Samples.Debugging.MdbgEngine.FrameCache.d 1.MoveNext()    at cpu_analyzer.ThreadSnapshot.GetThreadSnapshot(MDbgThread thread)    at cpu_analyzer.Program.Main(String[] args) Any ideas how to make it working? If you open the source on github, community may improve it to make more robust.