Diagnosing runaway CPU in a .Net production application
almost 14 years ago
So, you have this .Net app in production.
Somewhere someone made some sort of mistake and it appears the CPU is pegged for large periods of time.
…and you ask yourself, how can I debug this:
- There is no copy of Visual Studio installed.
- There is a strict no-installer policy on these machines.
- Performance is already messed up and you do not want to make stuff worse by diagnosing it.
To date the only answer I am aware of is magic voodoo art involving windbg and the sos extensions.
Sure you can run [process explorer]Process Explorer - Sysinternals | Microsoft Learn) and isolate the evil thread:
But … you have no idea how that thread relates to your .Net application or where that evil thread is running.
Enter my cpu-analyzer tool.
Here is a quick demo:
static void MakeBadStuffHappen() {
ThreadPool.QueueUserWorkItem(_ => { MisterEvil(); });
}
static void MisterEvil() {
double d = double.MaxValue;
while (true) {
d = Math.Sqrt(d);
if (d < 1.1) {
d = double.MaxValue;
}
}
}
static void Main(string[] args) {
MakeBadStuffHappen();
Console.WriteLine("Hello world!");
Console.ReadKey();
}
static void MakeBadStuffHappen() {
ThreadPool.QueueUserWorkItem(_ => { MisterEvil(); });
}
static void MisterEvil() {
double d = double.MaxValue;
while (true) {
d = Math.Sqrt(d);
if (d < 1.1) {
d = double.MaxValue;
}
}
}
static void Main(string[] args) {
MakeBadStuffHappen();
Console.WriteLine("Hello world!");
Console.ReadKey();
}
We run:
cpu-analyzer.exe evilapp
------------------------------------
ThreadId: 4948
Kernel: 0 User: 89856576
EvilApp.Program.MisterEvil
EvilApp.Program.<MakeBadStuffHappen>b__0
System.Threading.ExecutionContext.Run
System.Threading._ThreadPoolWaitCallback.PerformWaitCallbackInternal
System.Threading._ThreadPoolWaitCallback.PerformWaitCallback
... more lines omitted ...
Aha, the method called MisterEvil is responsible for 9 seconds in user mode.
Of course this trivial sample is kind of boring, but once you apply this tool to bigger and more complex applications it can be a life saver.
…and did I mention, no installer is required.
Update Source on GitHub GitHub - SamSaffron/cpu-analyzer
Very useful tool Sam thanks! Just gave it a spin, and it works a treat.
If you're making any updates, one small request: could you show the percentage of the sample time used by Kernel / User mode on each thread, e.g.
Kernel: 156250 (0.15%) User: 1718750 (1.7%)
(I find numbers with more than 6 digits or so hard to decipher at a glance.)