
VM-sickbay
Welcome - looks like something is wrong with your virtual machine.
Before we go into details ...
relax ...
drink a coffee and follow the VM-sickbay-rules
1. DON'T PANIC
2. do not try to start the VM again
3. do not mount disks from other VMs
4. do not use vmware-mount or vdiskmanager
5. do not use vdk in read/write mode
6. make a copy of all vmware-log files
7. relax
Are
you right now running a VM that seems to be in an older state than
expected ?
Shut it down now - do not run checkdisk and do not defragment.
In case you need to ask for help at VMTN
or VMware-forum it is important
that you provide sufficient information.
Without exact details the person trying to help you may miss-judge
the cause of the desease
and so prescribe in-adequate treatment.
Anamnesis describe symptoms
and conditions - post at VMTN or vmware-forum.de
Therapy fix or repair
Typical Symptoms
Post mortem recover data
Plan B for the desperate
DISCLAIMER:
The procedures I suggest for some typical problem often require
advanced skills.
I can't take any responsibilty when you mess up your data following
this tips.
This just lists up what I would do in such a case.
Please also note that official VMware support will very likely claim
that those cases listed as "restore" are lost.
Don't give up too early - you may still be able to rescue some data.
For in depth discussion of the listed procedure use the search
function at VMTN , german forum and sanbarrow forum.
Search for username "continuum" and terms from desaster
description table.
Ulli Hankeln
Anamnesis
In most cases this data is sufficient for a first analysis:
- a detailed file-listing of all files in the VM-directory - this
list should mention date, size and permissions
- all files smaller than 100 kb in the VM-directory
- screenshot of error message - if possible
- screenshot of datastorebrowser - if ESX
- screenshot of snapshotmanager
Please download the VM-sickbay-help-request.txt
and answer all further questions listed there.
Please simply edit the text-file.
Then pack all the files mentioned above along with your edited help-request
into a single zip-archive. Give that zip-file a reasonnable name
-
I already have more than enough vmware.zips and log.zips ...
IMPORTANT:
if you decide to ask someone for help first power down the VM.
Then create the report-zip and send it.
Do NOT power on the VM before you have a good theory of what went
wrong and a fix
Therapy
often used procedures ...
disktypes-table ...
the CID-chain ...
In case you want to try to fix the problem on your own - the next
table lists some common-deseases and the therapy I would prescribe.
desease |
Platform |
chances |
suggested procedure |
VMware complains - can not resume
(Display-messages) |
hosted |
unknown |
read vmware.log - find out expected screen
resolution - change host resolution and try again
if that does not work remove *.vmss to force a fresh start |
VMware complains - can not resume
(CPU-messages) |
hosted |
like
hard reboot |
disable cpu-compat-test
in the vmx-file
if that does not work remove *.vms to force a fresh start |
VMware complains - a file is in use
cannot start (in-use or locked messages) |
|
good |
remove stale lockdirs/lockfiles |
VMware complains - file too large "msg.disklib.tooBigForFS"
in vmware.log |
hosted |
unknown |
in case you tried to copy a vmdk to FAT32
try to add this line to the vmx-file:
diskLib.sparseMaxFileSizeCheck = "FALSE" |
only *-flat.vmdk exists |
all |
good |
use this
windows-tool and read |
|
|
|
|
descriptor vmdk is lost |
all |
good |
use sample from
list
restore entries
from last log |
starting a Linked Clone messed up original VM |
all |
bad |
restore still usable branches |
|
|
|
|
vmx is lost or blank |
all |
good |
restore
from last log |
VMware complains - disk needs repair
monolithicSparse |
hosted |
??? |
try -R parameter of recent vmware-vdiskmanager
vmware-vdiskmanager.exe -R corrupt.vmdk
if that does not work try to mount the vmdk with vdk
if that does not work either use Plan B |
|
|
|
|
VMware complains - file is not a virtual disk
monolithicSparse |
hosted |
??? |
try -R parameter of recent vmware-vdiskmanager
vmware-vdiskmanager.exe -R corrupt.vmdk |
VMware complains - file is not a virtual disk
descriptor vmdk is blank |
all |
good |
use sample from
list
restore entries
from last log |
VMware complains - file is not a virtual disk
monolithicSparse
embedded descriptor looks ok |
hosted |
??? |
try -R parameter of recent vmware-vdiskmanager
vmware-vdiskmanager.exe -R corrupt.vmdk |
VMware complains - file is not a virtual disk
monolithicSparse
embedded descriptor is blank or corrupt |
hosted |
unknown |
use sample from list
restore entries from log
see howto
inject with dsfi.exe |
|
|
|
|
VMware complains -parent has been changed |
all |
depends |
fix
CID-chain |
VMware complains -parent has been changed
- embedded descriptor |
hosted |
depends |
extract
with dsfo.exe
fix CID-chain-embedded descriptor
inject with dsfi.exe |
|
|
|
|
parent has been expanded - monolithicFlat |
all |
depends
on skills
of admin |
cut vmdk
fix geometry fix
CID-chain |
parent has been expanded - split vmdk |
all |
|
delete chunks added during expand
fix geometry fix
CID-chain |
parent has been expanded - monolithicSparse
complex snapshot-tree |
hosted |
|
extract descriptor of basedisk with dsfo.exe
fix geometry to the size before expand
fix CID-chain
inject with dsfi.exe
|
parent has been expanded - monolithicSparse
single snapshot |
hosted |
|
extract descriptor of snapshot with dsfo.exe
fix Extent description
to the size after expand
fix CID-chain
inject with dsfi.exe |
|
|
|
|
|
|
|
|
Typical symptoms
In the following I list some typical error-messages along with
the suggested fix ...
Could not open virtual machine: N:\test\test.vmx.
"N:\test\test.vmx" is not a valid virtual machine configuration
file.
Check for missing files failed.
Snapshots are not allowed on this virtual machine.
Sometimes a crash of the host - no matter if Windows, Linux or ESX
- results in a blank vmx.
Not always the error message is useful.
Anyway - if possible restore the vmx-file from the last vmware.log
if that is available.
If you restore the vmx from scratch you may easily assign the wrong
snapshot and so do more harm than good.
Cannot open the disk N:\test\test-000001.vmdk
or one of the snapshot disks it depends on.
Reason: The specified virtual disk needs repair.
This may happen with sparse disks on hosted platforms.
In the vmware.log this may appear as "Grain #* @* is orphaned.
"
VMware complains - disk needs repair
monolithicSparse |
hosted |
??? |
try -R parameter of recent vmware-vdiskmanager
vmware-vdiskmanager.exe -R corrupt.vmdk
if that does not work try to mount the vmdk with vdk
if that does not work either use Plan B |
Check for missing files failed.
The file specified is not a virtual disk.
Sometimes a crash of the host - no matter if Windows, Linux or
ESX - results in a blank vmdk descriptor.
VMware complains - file is not a virtual disk
monolithicSparse |
hosted |
??? |
try -R parameter of recent vmware-vdiskmanager
vmware-vdiskmanager.exe -R corrupt.vmdk |
VMware complains - file is not a virtual disk
descriptor vmdk is blank |
all |
good |
use sample from
list
restore entries
from last log |
VMware complains - file is not a virtual disk
monolithicSparse
embedded descriptor looks ok |
hosted |
??? |
try -R parameter of recent vmware-vdiskmanager
vmware-vdiskmanager.exe -R corrupt.vmdk |
VMware complains - file is not a virtual disk
monolithicSparse
embedded descriptor is blank or corrupt |
hosted |
unknown |
use sample from list
restore entries from log
see howto
inject with dsfi.exe |
Cannot open the disk N:\test\test-000001.vmdk
or one of the snapshot disks it depends on.
Reason: The parent virtual disk has been modified since the child
was created.
There are several possible reasons for this message:
- a vmdk was used with a second VM
- unwise manual edit of the vmx-file
- unwise operations with vmware-vdiskmanager or vmkfstools
- failed operations with snapshotmanager
- bugs, host crashes ....
In case a VM with snapshots does not start anymore because a basedisk
was expanded you have two options:
- fix the snapshots
- fix the basedisk
The first option only makes sense with VMs that only have one snapshot.
In all other cases it is very probably easier to undo the expand
of the basedisk.
parent has been expanded - monolithicFlat
|
all |
depends
on skills
of admin |
cut vmdk
fix geometry fix
CID-chain |
parent has been expanded - split vmdk |
all |
|
delete chunks added during expand
fix geometry fix
CID-chain |
parent has been expanded - monolithicSparse |
hosted |
|
extract descriptor of basedisk with dsfo.exe
fix geometry to the size before expand
fix CID-chain
inject with dsfi.exe
or
extract descriptor of snapshot with dsfo.exe
fix Extent description
to the size after expand
fix CID-chain
inject with dsfi.exe
|
The destination file system does not support large files
Did you move your VM to Fat32 ?
Are you using NTFS with Linux or EXT2 with Windows ?
Obviously it is a bad idea to copyvmdk- files larger than 2 GB to Fat32.
If you think your filesystem should support the files you use
you may try this entry in the vmx-file.
diskLib.sparseMaxFileSizeCheck= "false"
The version of the virtual disk is newer than the version supported
by this program
Open the descriptor of the vmdk and comment offending lines.
The SVGA mode stored in the snapshot cannot be restored on this
display ...
Read vmware.log - find out expected screen resolution - change
host resolution and try again.
If that does not work remove *.vmss to force a fresh start.
The suspended image contains a virtual machine
that uses floating point features that do not match the supported
features on the real machine ...
Disable cpu-compatibilty-test
in the vmx-file by adding this lines for the next start.
checkpoint.overrideVersionCheck = "true"
checkpoint.disableCpuCheck = "true"
If that does not work remove *.vmss to force a fresh start.
top
Post
mortem
In the following cases there is little hope to recover the vmdk
as is.
So first thing to try is to repair the vmdk so that it can be mounted
with vdk or a helper VM.
If that fails - and the data is important - try commercial tools
- I have good results with UFS-explorer.
If that does not work there is always Plan B as
the ultima ratio.
post mortem |
Platform |
chances |
trickiness |
suggested procedure |
basedisk is lost - only snapshots are left |
ESX |
recover
single files |
advanced |
fake
basedisk fix
CID-chain
restore data with LiveCD |
Windows |
recover
single files |
advanced |
fake
basedisk fix
CID-chain
mount with vdk
restore data with LiveCD |
Linux |
recover
single files |
advanced |
fake
basedisk fix
CID-chain
restore data with Helix LiveCD |
|
|
|
|
|
one or more flat chunks of a "twoGbMaxExtentFlat"
-disk is missing |
hosted |
restore |
average |
fake chunks
mount with vdk |
one or more sparse chunks of a "twoGbMaxExtentSparse"
- disk is missing |
hosted |
restore |
advanced |
fake chunks
mount with vdk |
first chunk of a "twoGbMaxExtentSparse"
- disk is missing |
hosted |
very bad |
advanced |
Plan B |
"monolithicSparse" is too small
after running out of disk-space |
hosted |
restore |
advanced |
copy to location with disk-space Plan
B
restore data |
one or more chunks of a "twoGbMaxExtentFlat"
are too small
after running out of disk-space |
hosted |
restore |
advanced |
copy to location with disk-space
expand
mount with vdk
restore data |
"monolithicFlat" is too small
after running out of disk-space |
all |
restore |
advanced |
copy to location with disk-space
expand
mount with vdk
restore data |
|
|
|
|
|
one or more chunks of a "twoGbMaxExtentSparse"
are too small
after running out of disk-space |
hosted |
restore |
advanced |
copy to location with disk-space Plan
B
restore data |
disk has holes (needs repair) |
hosted |
restore |
advanced |
mount with vdk
restore data |
starting a Linked Clone messed up original VM |
all |
bad |
very advanced |
restore still usable branches |
top
Plan B
Cases:
Rescue files from a virtual disk that is corrupted beyond repair
- no matter what platform.
Rescue files from a standalone snapshot.
Rescue files from single chunks of split vmdks
Overview of the procedure:
create large new monolithicFlat vmdk
wipe disk with zeros using a helper VM or a LiveCD like Knoppix,
UBCD4Win or MOA
format with filesystem used by the corrupted chunks
copy corrupted chunks into the disk
analyse with helper VM or LiveCD - using tools like UFS-explorer,
GetDataBack ....
search for lost, deleted files
Problem:
Folks may tell you this can't work.
Ignore them. You are desperate - aren't you ?
I had good results with rescueing files from NTFS formatted corrupt
vmdks.
Procedure is worth a try no matter what the original platform was.
I have used it with broken vmdks from ESX as well as from hosted
platforms
Limitations:
This technic can only recover files that were newly created and
saved during the timeframe the given chunk was active.
You can not restore files that were created in snapshot1 and last
saved in snapshot2.
top
the
CID-chain
VMware uses CID-values to verify if the snapshot chain is clean
before starting a VM.
If a snapshot chain is corrupt the CID-chain is broken.
Snapshot chains may get corrupted when
- snapshots were deleted manually
- the host crashed
- the system run out of diskspace
- unwise manual edits of vmdk or vmx-files
- a virtual disk was attached to a different VM
- the basedisk was expanded
- ....
All those mentioned cases will be noticed by VMware at startup
because of a break in the CID-chain..
As starting the VM in this conditions would further corrupt the
virtual disks the VM will not be started.
Good CID-chain
In this simplified listing of one basedisk and its two snapshots
in all
cases the child references the CID-value of its parent correctly.
###################### Windows Vista.txt ########################
CID=9a1f1a1f
parentCID=ffffffff
RW 104857600 SPARSE "Windows Vista.vmdk"
ddb.geometry.cylinders = "6527"
###################### Windows Vista-000004.txt ########################
CID=5cdd6af0
parentCID=9a1f1a1f
parentFileNameHint="Windows Vista.vmdk"
RW 104857600 SPARSE "Windows Vista-000004.vmdk"
###################### Windows Vista-000002.txt ########################
CID=c750afeb
parentCID=5cdd6af0
parentFileNameHint="Windows Vista-000004.vmdk"
RW 104857600 SPARSE "Windows Vista-000002.vmdk"
Broken CID-chain
In this simplified listing of one basedisk and its two snapshots
the parentCID in the first snapshot does NOT reference the correct
CID-value of its parent.
This means that during the last use of the "windows Vista.vmdk"
it probably was used
by another VM, or it was expanded or ...
###################### Windows Vista.txt ########################
CID=a123b123
parentCID=ffffffff
RW 104857600 SPARSE "Windows Vista.vmdk"
###################### Windows Vista-000004.txt ########################
CID=5cdd6af0
parentCID=9a1f1a1f
parentFileNameHint="Windows Vista.vmdk"
RW 104857600 SPARSE "Windows Vista-000004.vmdk"
###################### Windows Vista-000002.txt ########################
CID=c750afeb
parentCID=5cdd6af0
parentFileNameHint="Windows Vista-000004.vmdk"
RW 104857600 SPARSE "Windows Vista-000002.vmdk"
Special CID-values
CID=fffffffe
parentCID=ffffffff
|
This vmdk is a newly created basedisk
|
CID=********
parentCID=ffffffff
|
This vmdk is a basedisk |
CID=12345678
parentCID=12345678
|
When the parentCID-value matches the CID-value
this snapshot may be an orphan. There is something very wrong
with your snapshot chain. |
Therapy
: fake basedisk
Cases:
Rescue files from a standalone snapshot.
Rescue files from a snapshot chain
Overview of the procedure
find out what type of vmdk you need
find out what nominal size you need
find out which name the fake basedisk must use
create new disk with vmware-vdiskmanager.
find out which filesystem you need
format vmdk using a helper VM or a LiveCD
copy the vmdk into right path
find out which CID value is needed
attach snapshot
mount snapshot with helper VM or LiveCD
use recovery tools like UFS-explorer, GetDataBack ...
Chances:
There is a good chance to recover files which were unfragmented
inside the snapshot
Disktypes with external descriptor
vmfs
vmfsSparse
monolithicFlat
twoGbMaxExtentSparse
twoGbMaxExtentFlat
fullDevice
partitionedDevice
custom
Disktypes with embedded descriptor
monolithicSparse
streamOptimized
to extract the embedded descriptor run
dsfo.exe monolithicSparse.vmdk 512 800 descriptor.bin
to inject the descriptor again run
dsfi.exe monolithicSparse.vmdk 512 800 descriptor.bin
When working with embedded descriptors make sure you never inject
more than a full sector = 512 bytes.
The range 512 800 should be safe. |