{"id":3,"date":"2006-07-05T00:55:37","date_gmt":"2006-07-05T05:55:37","guid":{"rendered":"http:\/\/www.nynaeve.net\/?p=3"},"modified":"2019-12-13T17:43:37","modified_gmt":"2019-12-13T22:43:37","slug":"vmware-server-troubleshooting","status":"publish","type":"post","link":"http:\/\/www.nynaeve.net\/?p=3","title":{"rendered":"VMware Server and RDP don&#8217;t always play nicely together."},"content":{"rendered":"<p>Steve already stole my thunder (well, if that makes sense, since it was my paper anyway) by posting my analysis of this earlier, but I figure that it is also worth discussing here.<\/p>\n<p>\u00c2\u00a0Recently, I finally* got a got a new development box at work &#8211; multiproc, x64 capable (with the ability to run 64-bit VMs too!), lots of RAM, generally everything you would want in a\u00c2\u00a0really nice development box.\u00c2\u00a0 Needless to say, I was rather excited to see what I could do with it.\u00c2\u00a0 The first thing I had in mind was setting up a dedicated set of VMs to run my test network on and host various dedicated services such as our symbol server here at the office.<\/p>\n<p>\u00c2\u00a0(*: There is a long, sad story behind this.\u00c2\u00a0 For a long time, I&#8217;ve been having a VM running on an ancient ~233MHz box that nobody else at the office wanted (for obvious reasons!).\u00c2\u00a0 I had been trying to get a replacement box that didn&#8217;t suck so much to put this VM (and others) on to run full time, but just about every thing that could possibly go wrong with requesting a purchase from work did go wrong, resulting in it being delayed in the order of over half a year&#8230;).<\/p>\n<p>\u00c2\u00a0The box came with Windows XP Professional x64 Edition installed, so\u00c2\u00a0I figured that I might as well use the install instead of blowing it away and putting Windows Server 2003 on for now.\u00c2\u00a0 As it turned out, this came around to bite me later.\u00c2\u00a0 After installing all of the usual things (service packs, hotfixes, and soforth), I went to grab the latest <a title=\"VMware Server Product Page\" href=\"http:\/\/www.vmware.com\/products\/server\/\" target=\"_blank\" rel=\"noopener noreferrer\">VMware Server<\/a>\u00c2\u00a0installer so that I could put the box to work running my library of VMs.\u00c2\u00a0 Everything seemed to be going okay at the start, until I began to do things that were a bit outside the box, so to speak.\u00c2\u00a0 Here, I wanted to have my XP x64 box route through a VM running on the same computer.\u00c2\u00a0 Why on earth\u00c2\u00a0would I possibly want to do that, you ask?\u00c2\u00a0 Well, I have an internal VPN-based network that overlays the office network here at work and connects all of the VMs I have running on various boxes at the office.\u00c2\u00a0 I wanted to be able to interconnect all of those VMs with various services (in particular, lots and lots of storage space) running on the beefy x64 box over this trusted VPN network instead of the public office network (which I have for testing purposes designated the untrusted Internet network).\u00c2\u00a0 If I have the x64 box routing through something that is connected to the entire overlay network, then I don&#8217;t need to worry about creating connections to every single other VM in existance to grant access to those resources.\u00c2\u00a0 (At this point, our x64 support is still in beta, and XP doesn&#8217;t have a whole lot of useful support for dedicated VPN links.)<\/p>\n<p>\u00c2\u00a0Anyways, things start to get weird when I finally get this setup going.\u00c2\u00a0 The first problem I run into is that sometimes on boot, all of the VMs that I had configured to autostart would appear to hang on startup &#8211; I would have to go to Task Manager and kill the vmware-vmx.exe processes, then restart the vmserverdWin32 service before I could get them to come up properly.\u00c2\u00a0 After a bit of poking around, I noticed a suspicious Application Eventlog entry that seemed to correlate with when this problem happened on a boot:<\/p>\n<p>Event Type:\u00c2\u00a0Information<br \/>\nEvent Source:\u00c2\u00a0VMware Registration Service<br \/>\nEvent Category:\u00c2\u00a0None<br \/>\nEvent ID:\u00c2\u00a01000<br \/>\nDate:\u00c2\u00a0\u00c2\u00a06\/13\/2006<br \/>\nTime:\u00c2\u00a0\u00c2\u00a02:10:06 PM<br \/>\nUser:\u00c2\u00a0\u00c2\u00a0N\/A<br \/>\nComputer:\u00c2\u00a0MOGHEDIEN<br \/>\nDescription:<br \/>\nvmserverdWin32 took too much time to initialize.<\/p>\n<p>\u00c2\u00a0Hmm&#8230; that doesn&#8217;t look good.\u00c2\u00a0 Well, digging a bit deeper, it turns out that VMware Server has several different service components, and apparently there are dependencies between them.\u00c2\u00a0 However, the VMware Server developers neglected to properly assign dependencies between all of the services; instead, they appear to have just let the services start in whatever order and have a timeout window in which the services are supposed to establish communication with eachother.\u00c2\u00a0 Unfortunately, this tends to randomly break on some configurations (like mine, apparently).<\/p>\n<p>\u00c2\u00a0Fortunately, the fix for this problem turned out to be fairly easy.\u00c2\u00a0 Using sc.exe, the command line service configuration app (which used to ship with the SDK, but now ships with Windows XP and later &#8211; a handy tool to remember), I added an SCM dependency between the main VMware launcher service (&#8220;VMServerdwin32&#8221;)\u00c2\u00a0and the VMware authorization service (&#8220;VMAuthdService&#8221;):\u00c2\u00a0<\/p>\n<p>C:\\Documents and Settings\\Administrator>sc config vmserverdWin32 depend= RPCSS\/VMAuthdService<br \/>\n[SC] ChangeServiceConfig SUCCESS<br \/>\nAfter fixing the service dependencies, everything seemed to be okay, but of course, that wasn&#8217;t really the case&#8230;<\/p>\n<p>\u00c2\u00a0When I went home later that day, I decided to VPN into the office and RDP into my new development box in order to change some hardware settings on one of my VMs.\u00c2\u00a0 In this particular case, some of the VPN traffic from my apartment to the development box on the office happened to pass through that router VM which I had running on the development box.\u00c2\u00a0 Whenever I tried to RDP into the development box, it would just freeze whenever I tried to enter my credentials; the RDP connection would hang after I entered valid logon credentials at the winlogon prompt until TCP gave up and broke off the connection.\u00c2\u00a0 This happened every single time I tried to RDP into my new box, but the office connection was fine (I could still connect to other things at the office while this was happening).\u00c2\u00a0 Definitely not cool.\u00c2\u00a0 So, I\u00c2\u00a0opened a session on our development\u00c2\u00a0server at the office and decided to try an experiment &#8211; ping my new dev box from it while I try to RDP in.\u00c2\u00a0 The initial results of this experiment were not at all what I expected; my dev box responded to pings the whole time while it was apparently unreachable over RDP while the TCP connection was timing out.\u00c2\u00a0 The next time I tried RDPing in, I ran a ping from my home box to my dev box, and the pings were dropped while I was trying to make the RDP session connection to the console session after providing valid logon credentials, and yet the box still responded to pings from a different box at the office.<\/p>\n<p>After poking around a bit more, I determined that <em>every single VM<\/em> on my brand new dev box would just freeze and stop responding whenever I tried to RDP into my dev box from home (but not from the office).\u00c2\u00a0 To make matters even more strange,\u00c2\u00a0I could connect to a different box at the office, and bounce RDP through that box to my new dev server and things would work\u00c2\u00a0fine.\u00c2\u00a0 Well, that sucks &#8211; what&#8217;s going on here?\u00c2\u00a0 A partial explanation stems from how exactly I had setup the routing on my new dev box; the default gateway was set to my router VM (running on that box) using one of the VMnet virtual NICs, but I had left the physical NIC on the box\u00c2\u00a0still bound to TCP (without a default gateway set however).\u00c2\u00a0 So, for traffic destined to the office subnet, there is no need\u00c2\u00a0for packets\u00c2\u00a0to traverse the router VM &#8211; but for traffic from the VPN connection to my home, packets are routed through the router VM.<\/p>\n<p>\u00c2\u00a0Given this information, it seemed that I had at least found why the problem was happening, on some level &#8211; whenever I tried to RDP into my new dev box over the VPN, all of the VMs on my new dev box would freeze.\u00c2\u00a0 Because traffic through the VPN to my new dev box is routed through a VM on the new dev box, the RDP connection stalls and times out (because the router VM has hung).<\/p>\n<p>\u00c2\u00a0At this point, I had to turn to a debugger to understand what was going on.\u00c2\u00a0 Popping the vmware-vmx.exe process corresponding to the router VM open in the debugger and comparing call stacks between when it was running normally and when it was frozen while I was trying to RDP in pointed to the thread that emulated the virtual CPU becoming blocked on an internal NtUser call to win32k.sys.\u00c2\u00a0 At this point, I couldn&#8217;t really do a whole lot more without resorting to a kernel debugger, making that my next step.<\/p>\n<p>\u00c2\u00a0With the help of kd, I was able to track down the problem a bit further; the vmware CPU simulator thread was blocking on acquiring the global win32k NtUser lock that almost all NtUser calls acquire at the start of their implementation.\u00c2\u00a0 With the `!locks&#8217; command, I was able to track down the owner of the lock &#8211; which happened to be (surprise!) a Terminal Server thread in CSRSS for the console session.\u00c2\u00a0 This thread was waiting on a kernel event, which turns out to be signalled when the RDP TCP transport driver receives data from the network.\u00c2\u00a0 So, we have a classical deadlock situation; the router VM is blocking on win32k&#8217;s internal NtUser global lock, and there is a CSRSS thread that is holding the win32k internal NtUser global lock while waiting on network I\/O (from the RDP client).\u00c2\u00a0 Because the RDP client (me at home connecting through the VPN) needs to route traffic through the router VM to reach the RDP TCP transport on my new dev box, everything appears to freeze until the TCP connection times out.<\/p>\n<p>\u00c2\u00a0Unfortunately, there isn&#8217;t really a very good solution to this problem.\u00c2\u00a0 Installing Windows Server 2003 would have helped, in my case, because then VMware Server and its services would be running on session 0, and RDP connections would be diverted to new Terminal Server sessions (with their own per-session-instanced win32k NtUser locks), thus avoiding the deadlock (unless you happened to connect to Terminal Server using the `\/console&#8217; option).<\/p>\n<p>\u00c2\u00a0So there you have it &#8211; why VMware Server and RDP can make a bad mix sometimes.\u00c2\u00a0 This is a real shame, too, because RDPing into\u00c2\u00a0a box and running the VMware\u00c2\u00a0Server\u00c2\u00a0console client\u00c2\u00a0&#8220;locally&#8221;\u00c2\u00a0is <em>sooo<\/em> superior to running the VMware Server console client over the network (updates *much* faster, even over a LAN).<\/p>\n<p>\u00c2\u00a0If you&#8217;re interested, I did a writeup of most of the technical details of the actual debugging (with WinDbg and kd) of this problem that you can look at <a title=\"Detailed analysis of debugging the problem.\" href=\"http:\/\/apartment.skywing.valhallalegends.com\/Skywing\/Papers\/Win32k%20RDP%20Network%20IO%20Deadlock.txt\">here<\/a>\u00c2\u00a0&#8211; you are encouraged to do so if you want to see some of the steps I took in the debugger to further analyze the problem.<\/p>\n<p>\u00c2\u00a0In the future, I&#8217;ll try not to gloss over some of the debugger steps so much in blog posts; for this time, I had already written the writeup before hand, and didn&#8217;t want to just reformat the whole thing for an entire blog post.<\/p>\n<p>\u00c2\u00a0Whew, that was a long second post &#8211; hopefully, future ones won&#8217;t be quite so long-winded (if you consider that a bad thing).\u00c2\u00a0 Hopefully, future posts won&#8217;t be written at 1am just before I go to sleep, too&#8230;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Steve already stole my thunder (well, if that makes sense, since it was my paper anyway) by posting my analysis of this earlier, but I figure that it is also worth discussing here. \u00c2\u00a0Recently, I finally* got a got a new development box at work &#8211; multiproc, x64 capable (with the ability to run 64-bit [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[2,7,5],"tags":[],"_links":{"self":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts\/3"}],"collection":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3"}],"version-history":[{"count":1,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts\/3\/revisions"}],"predecessor-version":[{"id":708,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=\/wp\/v2\/posts\/3\/revisions\/708"}],"wp:attachment":[{"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.nynaeve.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}