My own datacenter had to go into ‘dark’ mode, because of imminent maintenance on the power-source.
This has led me to shutdown all VM’s on my network, then shut down the openstack compute nodes and finally openstack the physical machine. (in that order).
When the openstack nodes bootted again, one isolated test-compute/storage-node named yggdrasil wouldn’t boot a single VM… but to my horror, it ran the production IDM for all my hosts.
That meant none of my hosts were able to find eachother and slowly, all VM’s entered a rootshell because ISCSI couldn’t find the hard-drives…. woe me!
It turns out, i had run into 2 bugs at the same time.
Bug 1: The openstack cluster didn’t preserve its configuration ACL’s on the ISCSI device nodes.
Bug 2: My LVM devices were ‘discovered’ by the hardware node, so I couldn’t use targetcli to re-add them.
The path to discovering how this came to be.
First hint: I saw all the ACL’s were missing and therefore I got permission denied errors when trying to use iscsiadm to login to targets.
Since I had no comparison material, I had to find out the iscsi name first. Luckily it is saved in the file /etc/iscsi/initiatorname.iscsi.
So, I used the name and targetcli to re-add all the acl’s… now iscsiadm happily discovered a few nodes, but not all of them yet. Unfortunately, the discovered iscsi drives were not part of my IDM server so I had to continue debugging for now.
Then I noticed something horrible, one of my commands had failed because the logical volume (lvm) was “in-use” and therefore could not be added as a backstore from targetcli.
It cost me several days of googling until I finally hit the right answer.
lslbk …. YEP
The logical volume was supposed to be a harddisk, which contained another lvm configuration and stupidly enough, my hardware-node had activated the LVM configuration.
So I deactivated the lv with lvchange -a n /dev/system_vg/swap_lv.
I was lucky that my hardware node uses root_vg and not system_vg, or the mess might have been bigger!
And then I added a new filter content : ‘r|/dev/cinder-volume.*|’ to my lvm.conf to not have this happen again.
After this I could add the backing store, create the lun and acl. Add the thing to the portal and now iscsiadm would happily see all my luns. the others that were still missing had solved themselves automagically.
Booted my IDM and ‘lo and behold, all was coming up again and my world started spinning.