{"id":45,"date":"2006-12-07T15:38:48","date_gmt":"2006-12-07T20:38:48","guid":{"rendered":"http:\/\/www.nynaeve.net\/?p=45"},"modified":"2019-12-13T17:39:35","modified_gmt":"2019-12-13T22:39:35","slug":"an-introduction-to-kernrate-the-windows-kernel-profiler","status":"publish","type":"post","link":"http:\/\/www.nynaeve.net\/?p=45","title":{"rendered":"An introduction to kernrate (the Windows kernel profiler)"},"content":{"rendered":"<p>One useful utility for tracking down performance problems that you might not have heard of until now is kernrate, the Windows kernel profiler.  This utility currently ships with the <a title=\"Windows Server 2003 Resource Kit Tools Download Link\" href=\"http:\/\/www.microsoft.com\/downloads\/details.aspx?displaylang=en&#038;familyid=9D467A69-57FF-4AE7-96EE-B18C4790CFFD\">Windows Server 2003 Resource Kit Tools<\/a> package (though you can use kernate on Windows XP is well) and is freely downloadable.  Currently, you&#8217;ll have to match the version of kernrate you want to use with your processor architecture, so if you are using your processor in x64 mode with an x64 Windows edition, then you&#8217;ll have to dig up an x64 version of kernrate (the one that ships with the Srv03 resource kit tools is x86); KrView (see below) ships with an x64 compatible version of kernrate.<\/p>\n<p>Kernrate requires that you have the SeProfilePrivilege assigned (which is typically only granted to administrators), so in most cases you will need to be a local administrator on your system in order to use it.  This privilege allows access to the (undocumented) profile object system services.  These APIs allow programmatic access to sample the instruction pointer at certain intervals (typically, a profiler program selects the timer interrupt for use with instruction pointer sampling).  This allows you to get a feel for what the system is doing over time, which is in turn useful for identifying the cause of performance issues where a particular operation appears to be processor bound and taking longer than you would like.<\/p>\n<p>There are a multitude of options that you can give kernrate (and you are probably best served by experimenting with them a bit on your own), so I&#8217;ll just cover the common ones that you&#8217;ll need to get started (use &#8220;kernrate -?&#8221; to get a list of all supported options).<\/p>\n<p>Kernrate can be used to profile both user mode and kernel mode performance issues.  By default, it operates only on kernel mode code, but you can override this via the -a (and -av) options, which cause kernrate to include user mode code in its profiling operations in addition to kernel mode code.  Additionally, by default, kernrate operates over the entire system at once; to get meaningful results with profiling user mode code, you&#8217;ll want to specify a process (or group of processes) to profile, with the &#8220;-p <em>pid<\/em>&#8221; and\/or &#8220;-n <em>process-name<\/em>&#8221; arguments.  (The process name is the first 8 characters of a process&#8217;s main executable filename.)<\/p>\n<p>To terminate collection of profiling data, use Ctrl-C.  (On pre-Windows-Vista systems where you might be running kernrate.exe via runas, remember that Ctrl-C does not work on console processes started via runas.)  Additionally, you can use the &#8220;-s <em>seconds<\/em>&#8221; argument to specify that profling should be automagically stopped after a given count of seconds have elapsed.<\/p>\n<p>If you run kernrate on kernel mode code only, or just specify a process (or group of processes) as described above, you&#8217;ll notice that you get a whole lot of general system-wide output (information about interrupt counts, global processor time usage, context switch counts, I\/O operation counts) in addition to output about which modules used a noteworthy amount of processor time.  Here&#8217;s an example output of running kernrate on just the kernel on my system, as described above (including just the module totals):<\/p>\n<pre>\r\nD:\\\\Programs\\\\Utilities&gt;kernrate\r\nKernrate User-Specified Command Line:\r\nkernrate\r\n\r\n\r\nKernel Profile (PID = 0): Source= Time,\r\nUsing Kernrate Default Rate of 25000 events\/hit\r\nStarting to collect profile data\r\n\r\n***> Press ctrl-c to finish collecting profile data\r\n===> Finished Collecting Data, Starting to Process Results\r\n\r\n------------Overall Summary:--------------\r\n\r\n[...]\r\n\r\nOutputResults: KernelModuleCount = 153\r\nPercentage in the following table is based on\r\nthe Total Hits for the Kernel\r\n\r\nTime   197 hits, 25000 events per hit --------\r\n Module    Hits   msec  %Total  Events\/Sec\r\nintelppm     67        980    34 %     1709183\r\nntkrnlpa     52        981    26 %     1325178\r\nwin32k       35        981    17 %      891946\r\nhal          19        981     9 %      484199\r\ndxgkrnl       6        980     3 %      153061\r\nnvlddmkm      6        980     3 %      153061\r\nfanio         3        981     1 %       76452\r\nbcm4sbxp      2        981     1 %       50968\r\nportcls       2        980     1 %       51020\r\nSTAC97        2        980     1 %       51020\r\nbthport       1        981     0 %       25484\r\nBTHUSB        1        981     0 %       25484\r\nNtfs          1        980     0 %       25510<\/pre>\n<p>Using kernrate in this fashion is a good first step towards profiling a performance problem (especially if you are working with someone else&#8217;s program), as it quickly allows you to narrow down a processor hog to a particular module.  While this is useful as a first step, however, it doesn&#8217;t really give you a whole lot of information about what specific code in a particular mode is taking a lot of processor time.<\/p>\n<p>To dig in deeper as to the cause of the problem (beyond just tracing it to a particular module), you&#8217;ll need to use the &#8220;-z <em>module-name<\/em>&#8221; option.  This option tells kernrate to &#8220;zoom in&#8221; on a particular module; that is, for the given module, kernrate will track instruction pointer locations within the module to individual functions.  This level of granularity is often what you&#8217;ll need for tracking down a performance issue (at least as far as profiling is concerned).  You can repeat the &#8220;-z&#8221; option multiple times to &#8220;zoom in&#8221; to multiple modules (useful if the problem you are tracking down involves high processor usage across multiple DLLs or binaries).<\/p>\n<p>Because kernrate is resolving instruction pointer sampling down to a more granular level than modules (with the &#8220;-z&#8221; option), you&#8217;ll need to tell it how to load symbols for all affected modules (otherwise, the granularity for profiler output will typically be very poor, often restricted to just exported functions).  There are two ways to do this.  First, you can use the &#8220;-j <em>symbol-path<\/em>&#8221; command line option; this option tells kernrate to pass a particular symbol path to DbgHelp for use with loading symbols.  I recommend the second option, however, which is to <a title=\"Setting a default symbol path for all DbgHelp programs\" href=\"http:\/\/www.nynaeve.net\/?p=28\">configure your _NT_SYMBOL_PATH<\/a> before-hand so that it points to a valid DbgHelp symbol path.  This relieves you of having to manually tell kernrate a symbol path every time you execute it.<\/p>\n<p>Continuing with the example I gave above, we might be interested in just what the &#8220;win32k&#8221; (the Win32 kernel mode support driver for USER\/GDI) module is doing that was taking up 17% of the processor time spent in kernel mode on my system (for the interval that I was profiling).  To do that, we can use the following command line (the output has been truncated only include information that we are interested in):<\/p>\n<pre>D:\\\\Programs\\\\Utilities&gt;kernrate -z win32k\r\n\r\nKernrate User-Specified Command Line:\r\nkernrate -z win32k\r\n\r\n\r\nKernel Profile (PID = 0): Source= Time,\r\nUsing Kernrate Default Rate of 25000 events\/hit\r\nCallBack: Finished Attempt to Load symbols for\r\n90a00000 \\\\SystemRoot\\\\System32\\\\win32k.sys\r\n\r\nStarting to collect profile data\r\n\r\n***> Press ctrl-c to finish collecting profile data\r\n===> Finished Collecting Data, Starting to Process Results\r\n\r\n------------Overall Summary:--------------\r\n\r\n[...]\r\n\r\nOutputResults: KernelModuleCount = 153\r\nPercentage in the following table is based on the\r\nTotal Hits for the Kernel\r\n\r\nTime   2465 hits, 25000 events per hit --------\r\n Module      Hits   msec  %Total  Events\/Sec\r\nntkrnlpa     1273      14799    51 %     2150483\r\nwin32k        388      14799    15 %      655449\r\nintelppm      263      14799    10 %      444286\r\nhal           236      14799     9 %      398675\r\nbcm4sbxp       66      14799     2 %      111494\r\nspsys          55      14799     2 %       92911\r\nnvlddmkm       48      14799     1 %       81086\r\nSTAC97         31      14799     1 %       52368\r\n\r\n[...]\r\n\r\n\r\n===> Processing Zoomed Module win32k.sys...\r\n\r\n\r\n----- Zoomed module win32k.sys (Bucket size = 16 bytes,\r\nRounding Down) --------\r\nPercentage in the following table is based on the\r\nTotal Hits for this Zoom Module\r\n\r\nTime   388 hits, 25000 events per hit --------\r\n Module                  Hits   msec  %Total  Events\/Sec\r\nxxxInternalDoPaint         44      14799    10 %       74329\r\nXDCOBJ::bSaveAttributes    20      14799     4 %       33786\r\nDelayedDestroyCacheDC      20      14799     4 %       33786\r\nHANDLELOCK::vLockHandle    15      14799     3 %       25339\r\nmmxAlphaPerPixelOnly       15      14799     3 %       25339\r\nXDCOBJ::RestoreAttributes  13      14799     2 %       21960\r\nDoTimer                    12      14799     2 %       20271\r\n_SEH_prolog4               11      14799     2 %       18582\r\nmemmove                     9      14799     2 %       15203\r\n_GetDCEx                    6      14799     1 %       10135\r\nHmgLockEx                   6      14799     1 %       10135\r\nXDCOBJ::bCleanDC            5      14799     1 %        8446\r\nXEPALOBJ::ulIndexToRGB      5      14799     1 %        8446\r\nHmgShareCheckLock           4      14799     0 %        6757\r\nRGNOBJ::bMerge              4      14799     0 %        6757\r\n\r\n[...]<\/pre>\n<p>This should give you a feel for the kind of information that you&#8217;ll get from kernrate.  Although the examples I gave were profiling kernel mode code, the whole process works the same way for user mode if you use the &#8220;-p&#8221; or &#8220;-n&#8221; options as I mentioned earlier.  In conjunction with a debugger, the information that kernrate gives you can often be a great help in narrowing down CPU usage performance problems (or at the very least point you in the general direction as to where you&#8217;ll need to do further research).<\/p>\n<p>There are also a variety of other options that are available in kernrate, such as features for gathering information about &#8220;hot&#8221; locks that have a high degree of contention, and support for launching new processes under the profiler.  There is also support for outputting the raw sampled profile data, which can be used to graph the output (such as you might see used with tools like <a title=\"KrView (the Kernrate Viewer) Download Link\" href=\"http:\/\/www.microsoft.com\/whdc\/system\/sysperf\/krview.mspx\">KrView<\/a>).<\/p>\n<p>Although kernrate doesn&#8217;t have all the &#8220;bells and whistles&#8221; of some of the high-end profiling tools (like Intel&#8217;s vTune), it&#8217;s often enough to get the job done, and it&#8217;s also available to you at no extra cost (and can be quickly and easily deployed to help find the source of a problem).  I&#8217;d highly recommend giving it a shot if you are trying to analyze a performance problem and don&#8217;t already have a profiling solution that you are using.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One useful utility for tracking down performance problems that you might not have heard of until now is kernrate, the Windows kernel profiler. This utility currently ships with the Windows Server 2003 Resource Kit Tools package (though you can use kernate on Windows XP is well) and is freely downloadable. Currently, you&#8217;ll have to match [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[2,4,5],"tags":[],"_links":{"self":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts\/45"}],"collection":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=45"}],"version-history":[{"count":1,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts\/45\/revisions"}],"predecessor-version":[{"id":624,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts\/45\/revisions\/624"}],"wp:attachment":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=45"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=45"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=45"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}