Murky Waters
Another day, another rant about something that should work but does not. This time, I’m not exactly sure who to blame, so maybe you can help?
The backstory: I setup a lot of servers for a lot of companies. In the past year, the PowerEdge R610 has been our go-to server for small businesses. We always add in at least one Intel gigabit NIC adapter because of problems, philosophical and practical, with the integrated Broadcom NICs. Dell has used the Intel 82576 NIC as their dual-port NIC of choice for some time now.
The story: I am setting up trying to setup what is essentially a new network for a company. They have a number of old servers and when I am done, they will work from a clustered virtual environment using Hyper-V with freshly-built VMs, preserving only data and Active Directory. We purchased two identical PowerEdge 610s and a PowerVault MD3200i. I downloaded Windows Server 2008 R2 with SP1 from Microsoft’s volume licensing page, burned the ISO, installed Windows and… the server hung. In a different spot, every time. On both servers. Most of the time, I would get to the desktop and it would immediately hang, though the mouse worked, Num/Caps/Scroll buttons and lights were responsive, and I could drag any open windows around the screen but no keyboard shortcuts worked, open programs hung, Start button and taskbar were frozen. Same deal in Safe Mode and Safe Mode with Networking. On some occasions, it would freeze during the installation process. Every reboot, it would make it a little further, making it a little more frustrating.
I tested it on a VM, same disk, quick install, no problems. Called Dell, troubleshot for a while, and he found a document listing similar problems that pointed back to the Intel gigabit NICs. I pulled the cards, booted, and everything worked great. Updated chipset and tried to update the NIC drivers but found that I could not do that unless the cards were physically in the systems. This was, of course, impossible since the systems hung immediately after making it to Windows, if I could even login. How pleasant.
Since I have installed Server 2008 R2 dozens of times on R610s, I theorized that there was a bad driver with the new SP1-integrated build. 2.8GB later, I had a new non-SP1 disk, installed, and… same exact problem. Called Dell again and after a rather long time on hold, was told that, “This is a known issue, you have to install Windows without the cards physically in the machines. When you’re done, download the driver to your desktop, power down, install the card, power up, and then install the drivers really quickly before it hangs.” Yes, really.
And so I tried it. And so it worked. And so I wasted about two days on a problem that I had never seen that involved an OS I have installed more times than I can count on hardware I work with constantly… for a problem that someone at Dell, the OEM who sold me these two servers and offered the add-in card as a factory option, knew about.
I’d say I hate to sound like a broken record but I don’t hate it at all. What I hate is the need to complain about things like this with such regularity. Someone, either Dell, or Intel, or Microsoft, screwed up and I would love to know who that is. I’m sure some sagely reader will comment or email me that I should have known better than to install an OS with add-in cards, but seriously, dude, go to Hell — if Windows natively supports a piece of hardware, immediately after installation, I expect it to work. If Intel is going to mass produce hardware and Dell is going to offer said hardware as a factory-installed, premium option, I expect it to work. The end. Since Microsoft cannot be responsible for hardware manufacturers, I am inclined to point at Intel or Dell. On the other hand, this was not an OS preinstalled by Dell and they can’t really be responsible for the drivers built into the Windows Server 2008 R2 installer, so I’m in a bit of a quandary.
Regardless of who is to blame, I pound my war drum once again and issue a cry of, “Why is this acceptable?” The answer, it seems, is because we have no choice in the matter and no unified way to voice our complaints. With the expectation of quality so low, technology providers only have to put in a tiny bit more effort than their competitors, which means they want to ensure their shit doesn’t spontaneously catch fire or electrocute you or something.
Don’t get me wrong, I am not an ogre. Dell’s tech support guys are always courteous, as helpful as possible, and knowledgable, but they can only do so much. I realize there are bugs, I realize things need to be patched, and I realize that you just can’t make everything work perfectly when you are dealing with many variables. In cases like this, though, or cases like my recent issues with Symantec System Recovery’s management application, I major issue with expensive equipment not working right out of the box; verifiably broken not just on one server but on many servers; not just for me but for many, many people who are using the same equipment.
I am now two days behind in my work. My client is going to be billed for my time because my company wants to be paid for its time and I want to be paid for my time. Dell got their money, Intel got their money, Microsoft got their money, but neither me nor my client messed up this time. We played this game with Dell before and they never want to assume blame or make any amends; after all, this is a problem with Windows and drivers, or so they will say. In the end, we will likely eat some time and the client will be stuck with a reduced bill, but I will look like an asshole no matter what.
Nothing, and I mean nothing, changes the fact that we, the people working with this equipment, are sailing in a sea of low standards. From the OS to the hardware to the software, lousy engineering is the norm. Microsoft owns the SMB computing world, meaning we have few options unless we want to somehow convince the decision makers to abandon ship and set sail in the relatively uncharted waters of Linux or the limiting Mac Server, which I do not see happening any time soon. It will go on and on, the same waves always visible over the horizon, land always a guarantee for the intrepid and dedicated adventurer, but the fact remains: I would not be navigating by the stars if my GPS would boot past the welcome screen.
Epilogue, updated 8 minutes after the post went live: I wrote this while Windows was installing on the second server. A few minutes after posting this, the installation finished. I copied the driver update EXE from the other server, powered down, popped in the cards, powered up, but it now seems that I was not fast enough in my driver upgrade. Device manager is hung, the upgrade app is hung, and I am not pleased.