I had a nasty shock this week with ESX3.
I was going about expanding virtual disks and reallocating resources for one client. Now, I have done this MANY times, so I thought that “the 2 day old backup is sufficient” and did not wait 3-4 hours for a new backup, right before what will be a 10 min task.
I went to expand the virtual disks from the COS and noticed that there were some “Virtual-Disk-000001-delta.vmdk” and “Virtual-Disk-000001.vmdk” files present.
“Oh, a snapshot is here for some reason..?”, I pondered. I then went into the VI3 management console, drilled down to said VPS and went to the snapshot manager, expecting to find a snapshot and then simply commit it to the main disk so I could get back to expanding.
What I found was “No Snapshots for this Virtual Server”.
Hmmm….. “maybe they are old snapshot files that should have been deleted, but weren’t”, I further mused. And =IF= it is a snapshot, surely vmkfstools will not let me run a dangerous or incompatible command. So off I went to expand this virtual disk by another 100GB.
– Expand disk:
“vmware-cmd –X 220GB Virtual-Disk.vmdk”
Expansion done. All looks good. Fire up VPS…..
“Sorry VPS can’t be started because one of the base files that a given snapshot is based on has been modified and thus can’t be mounted”.
” *^&#*^&@(*&!(@&(!*&(&! “
Ok, no harm no foul. The actual disk is not changed. Doing an expand with vmkfstools just adds a marker for more size… surely I can just remove the extra addition, ‘rollback the expansion’ so to speak and all will be spiffy?
Nup. Even though I knew in the back of my head that shrinking a VMDK was NOT POSSIBLE in ESX3 as it was in ESX2.5, I still went searching in the faint hope that I had overlooked some trick during past information gathering exercises when I was not under so much pressure and panic as I was this time.
No dice. What I knew was confirmed. I can’t shrink it. I can’t even load up ghost and mirror, because the main problem is that this Virtual-Disk-000001-delta.vmdk file should be appended to the end. And seeing as it was 25GB in size, for what is a 100GB virtual disk and the data stamp was some 3 months prior – there is A LOT of data and changes that are at risk now.
” *^&#*^&@(*&!(@&(!*&(&! “
OK, on to google again. After some searching and a lot of effort in trying to refine my query, which was needed, because as opposed to what I actually found out (this being the #4 and #5 global support issues for VMWARE), information was scant. I did manage to find a couple of blogs that had some very brief and lacking in all technical detail, reviews of the recent VMWorld summit.
So with that hook, I then started to search on detailed info from that summit and managed to get a PPT file from one of the developers. And inside were all the details that I needed. Or thought that I needed. Because with any system as complicated as VMWARE, definitions of words and correct semantics can make if very difficult to get a clear grasp of one problem, versus a slight variation of it. And even a slight change can come with very different procedures to use and using the wrong ones could make a problem worse. First rule – do no more harm.
I then went to the page that was titled “Expanding the size of a VMDK with an existing Snapshot”. I did not know if this meant, “how to expand a VMDK with an existing snapshot and keep it intact”, or “How to recover from a monumental screw up that only an idiot would do, when expecting vmkfstools to do all due diligence for him and has fucked up the VMDK that happened to have a current and active snapshot that wasn’t committed to the main VMDK file first”
I assumed it meant the latter, being “tech support” and “high rating”… if it was documentary of a feature or process it would have been, well, better documented.
The procedure is this:
– After I was an idiot and issued this command to cause all the problems:
“vmkfstools –X 220G Virtual-Disk.vmdk”
– Check the “Virtual-Disk.vmdk” file with vi and look for the following lines:
RW 482344960 VMFS “Virtual-Disk-000001-delta.vmdk”
– Now check the “Virtual-Disk-000001.vmdk” file and look for the following lines:
RW 209715200 VMFS “Virtual-Disk-000001-delta.vmdk”
What we now know is the current RW value on the newly expanded “Virtual-Disk.vmdk” and it is 482344960. We want to ‘trick’ the system into thinking that the expand never happened. So we then go and replace that value with the one we got from the delta vmdk. So we replace 492344960 with 209715200.
– Now we need to commit all snap shots:
“vmware-cmd /vmfs/volumes/VMFSVOLUME/VPS/VPS.vmx removesnapshots”
Unfortunately I was not done yet. The system reported back that the virtual machine “VPS.vmx” did not have any snapshots present! “Ah ha” I thought. While this is not good, it is also the reason why vmkfstools went on and screwed everything in the start. There is a snapshot there – that is a fact – but the system does not believe so.
This is where global common VMWARE problem #5 comes in, “Corrupted .VMSD file”. In a nutshell this means that the file that tracks all this snapshot info (amongst other tid bits) is somehow compromised. So a new one is needed. This is also fairly simple once you know how:
– First rename the current VMSD file:
mv VPS.vmsd VPS.vmsd.old
– Now create a new snapshot to force the system to generate a new all emcompasing VMSD file:
“vmware-cmd VPS.vmx createsnapshot addedforrecovey “You are an IDIOT”
– Now commit all snapshots like we wanted to do before anyway. You have to commit them all:
“vmware-cmd VPS.vmx removesnapshots”
Now that all the snapshots are committed (the original one and the temp one we made to help recreate the VMSD file) we can continue the process of fixing up our expanding a disk issue. And this is as simple as running the initial vmkfstools expand command that we ran before, that caused all the problems. This is needed so that the correct RW values are set in Virtual-Disk.vmdk” because in the end, the virtual disk IS expanded already.
– So issue the command:
“vmware-cmd –X 220GB Virtual-Disk.vmdk”
In the end, I am NOT STUPID enough to try and expand a virtual disk with a snapshot. However if you DO SEE delta files in your file system, do not trust the VI3 clients snapshot manager if it says “No Snapshots present”. As a matter of caution, I would follow the process above to recreate a new VMSD file to be sure and commit the temporary and any other snapshots that may exist. Then you can go on and expand your disks.
Also, make sure that you have backups. While I did and they weren’t totally fresh and the client was not too upset when briefed of the situation, it could have been much worse.
DON’T LET A JUNIOR TECH TOUCH THINGS!
TAKE THE TIME TO RELAX AND ASSES THE SITUATION BEFORE YOU POSSIBLY MAKE IT WORSE!