Virtualization Panic


After few days running smoothly on ESXi, I got my first scare.

Here is what happened:

First, I upgraded the Windows XP instance running within the VM to SP3.

Then, I created a new Virtual Machine from scratch on my running ESXi server to learn how to do it.

The idea was to create a new virtual file server and move all my multimedia data into it so that I could shrink the size of my website VM.

Everything worked great. This is the process:

  • Create new VM using the Virtual Infrastructure Client
  • Attach the ISO image of Windows XP to the VM and set it up so that it would boot from it
  • Configure the network to use an available IP address (192.168.1.4); I probably could have used DHCP.

Everything worked just fine. I did everything through the management UI and without reading the documentation. I just had to google how to make the VM boot from an ISO CD image to install Windows XP into it. I also had to mount a floppy disk ISO to install the VMware SCSI drivers for Windows XP. Very straightforward.

I started the copy of the “photos” directory to the new VM partition and I let it run overnight (60GB).

In the morning I realized the the main VM had restarted during the night. I blamed Windows’ automatic update service which I now realize I should have disabled (ah! these simple bakers turned naive system administrators…) .

I blamed Microsoft and re-started the copy operation.

But after a short while……. NOOOOOOOOOO!!!

The blue screen of death

Panic

I guess at this point I experienced what I am sure some customers go through sometime. After being on the virtualization high for getting everything virtualized smoothly, I had my first blue screen of death and paniced a little.

Note that I had very reasonable excuses to:

  1. blame Microsoft because they ‘made’ me upgrade to SP3
  2. blame myself, because I jammed another virtual machine on the same server

But both excuses were scary and lame because:

  1. Virtualization is not supposed to impact your ability to patch and update the hosted operating system
  2. Virtualization is DESIGNED to let you run multiple VMs on the same server. that’s the whole point!!! (plus I was still running at 15% CPU utilization at most…)

So, I armed myself with more patience and confidence in the product and debugged a little deeper. I noticed that the error in blue screen of death was “page_fault_in_nonpaged_area“, so i did some research and found out that this error is sometimes due to faulty memory chips… and I had my aha!!!! moment.

If you remember, when I changed the hard drive in the server, I also added a memory bank that I found in my hardware drawer…

So, I removed it and went back to 1.5 GB RAM and the system is back up and running.

And so is my confidence in VMware products!!!! 🙂

The Lesson

When deploying a new infrastructure, one needs to be careful about all the moving parts, document everything and most importantly be committed to the change.

I wanted it to work, and I know it works. So I did not blame the virtualization layer and went look for the real cause that turned out to be a faulty piece of hardware.

But what if I was not a champion of the technology and I did not know that it does reliably work in ten of thousands of production installations?

A little glimps into a dynamic that does happen in the relationship between IT shops and their technology and solution providers all the time.

Customers need:

  • Guidance
  • References
  • Commitment and Support

They want their vendors to have their back when they run into trouble

Vendors need

  • A technology champion and commitment from the top within the customer
  • Customer references that provide confidence but even more importantly prescriptive guidance on how to deploy the technology successfully
  • Be there for the customer

More on guidance and commitment later as they core to my first project at VMware.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s