Windows 7/Server 2008 R2 SP1 Install Woes
I never thought I’d be one of those guys who gets excited about RAM upgrades or service packs but, well, there I was, counting down the days to July 16 when Microsoft put SP1 on TechNet. It goes live to the world next week. SP1 doesn’t offer much in the way of new features for most people but one in particular was of great interest to me: Dynamic RAM for Hyper-V. I have a large number of Hyper-V machines, a few of which are clustered, and a couple are highly-utilized. Since processor and HD utilization will adjust based on need, RAM becomes the issue because as soon as you allocate those 2 or 4 or 8 or more GB to your virtual machine and start it up, that RAM is gone, regardless of how much it’s actually using. No more. Awesome.
Normally, I’d wait a little bit to install the update on production servers but one of my networks, a client with a cluster of virtual terminal servers, is using almost all 24GB of their RAM. This update is a big help and will save them quite a bit of money over adding more RAM. I first installed SP1 on my office desktop and it went in without a hitch. One guy in the office tried installing it on his desktop and it kept bombing out. Still, the need was there, so I moved all resources over to one node of the cluster, installed the SP, rebooted, and all was well. It was when I tried the second node that the problem started.
The SP seemed to install correctly. After reboot, it said the update failed and it rolled back, rebooted, and presented me with the most generic of errors. It said the service pack failed, “Element not found.” Might have been ELEMENT_NOT_FOUND. A message saying “Error not found” and a link to a MS KB at the bottom gave a boring, nonspecific troubleshooting guide: SFC /SCANNOW, running the Windows System Update Readiness Tool, and doing an in-place upgrade (AKA repair install). First two didn’t help, third was not a viable option.
I started reading. Windows Vista introduced something called CBS, Component Based Servicing, which you can read about here. While certain notifications about Windows Updates are still held in windowsupdate.log, the details about what is happening is held in %SYSTEMROOT%\logs\CBS\cbs.log. This log is backed up and recreated every time you install hotfix for any other update.
After hours (and hours and hours) of troubleshooting, trying to figure out what was different between my nodes, I posted to Microsoft’s official forum, uploaded my CBS folder, and they gave me my answer. Turns out, a specific hotfix was missing files: this one. The Microsoft rep who gave me the answer quoted my log and gave the answer.
2011-02-17 12:41:18, Info CBS Doqe: q-uninstall: Inf: wvid.inf , Ranking: 2, Device-Install: 0, Key: 113, Identity: wvid.inf, Culture=neutral, Type=driverUpdate , Version=6.1.7600.16475 , PublicKeyToken=31bf3856ad364e35, ProcessorArchitecture=amd64, versionScope=NonSxS
2011-02-17 12:41:18, Info CBS Perf: Doqe: Uninstall started.
2011-02-17 12:41:18, Info CBS Doqe: [Forward] Uninstalling driver updates, Count 54
2011-02-17 12:41:19, Info CBS DriverUpdateUninstallUpdates failed [HRESULT = 0x80070490 – ERROR_NOT_FOUND]
Files from the security update
MS10-010: Vulnerability in Windows Server 2008 Hyper-V could allow denial of service
are missing
That’s great and all but how did they come to that conclusion?
Identity: wvid.inf , Culture=neutral, Type=driverUpdate , Version=6.1.7600.16475
this tells me that the update causes the issue.
Yeah, but what from that says that the specific hotfix was the problem? How did they find that line in my 45MB+ log file?
google? look at the inf file version on the KB page:
http://support.microsoft.com/kb/977894
“ERROR_NOT_FOUND” must appear 100 times at least, so do references to wvid.inf. I asked for them to walk me through it, step by step.
I opened the Log file, and searched for Error so I found this line. I looked what the installer was doing and saw that it tries to remove the inf. I googled what the inf is and found this Hyper-V update.
Still doesn’t explain how they got to that specific error. I did look at it a bit and found that while there are hundreds of errors that look like this:
Failed to get session package state for package: Package_6_for_KB2482017~31bf3856ad364e35~amd64~~6.1.1.1 [HRESULT = 0x80070490 – ERROR_NOT_FOUND]
The one that he pointed out was unique and had no similar lines anywhere. It does make sense when you look at it: it tries to uninstall a hotfix, can’t find a file, and fails. But there are A LOT of errors in the file and you’d think that somewhere it would identify a critical failure and point to this as the cause for the entire service pack bombing out. I asked again whether they had a way of filtering out the ERROR_NOT_FOUND messages that didn’t matter VS those that did and didn’t get a response yet. I get this feeling that they just know what to look for and used CTRL + F until they found an error that didn’t look like the others.
With the KB identified, I had to find a way to uninstall it. Going through Add/Remove programs didn’t work, it tried to uninstall, rebooted and failed. Using WUSA /UNINSTALL /KB: seemed to work but, as I later learned, is not supported or, well, working. The end result was to search the registry for the KB, pull out every key, reinstall the hotfix, reboot, install the SP, and reboot. Success. But frustrating. Our company’s virtual host also failed its SP1 install though some of our VMs did not. Two other guys in the office installed it on their Windows 7 machines without difficulty. When I have some time, I’ll check out our VH’s log and see if I can apply this experience. Either way, I find it extremely frustrating that I had to do this much work to fix what should be a very simple process. If the Service Pack install made use of the Windows Event log, filtering errors would have been easy. If the error message given after the failure was more clear, I could have troubleshot effectively. If SFC /SCANNOW found missing files — since there WERE missing files — that would have helped. If the Windows System Update Readiness tool searched more thoroughly, that would have helped too. While the Microsoft rep was very knowledgeable and rather prompt, I shouldn’t have to resort to that in the first place. This Service Pack was made available to TechNet not three days ago and it failed our first 3 of 6 tests; another poster on the MS forum said that one of their cluster nodes also wouldn’t install it so he just wiped and reinstalled. This is unacceptable and does not bode well for Microsoft’s PR in the coming weeks.