Why Proper Virtual Network Labeling is a Must
Coming from a networking background, I always try my best to use descriptive and meaningful network labeling. This best practice helps significantly differentiate external and internal connections when working with firewalls and routers. Labeling becomes more important as those network appliances are increasingly deployed as virtual machines since you cannot see any cables in your virtual environment.
With virtual networks, labeling is not just helpful, it can break your infrastructure if not done properly. A while ago when we started experimenting with Microsoft Hyper-V we had a funny experience that can be used as a good example of that.
We installed a brand new two-node failover cluster and everything was working perfectly; we were able to live migrate machines from Node1 to Node2 and back, and we had successfully tested a simulated failure of one of the nodes to force the remaining node to run the workload. Everything was working fine.
The testing dragged on for a few months until this Hyper-V cluster went into production, hosting our SharePoint farm. At that point we decided to add two more nodes to the cluster. This should have been a straightforward task as the two new nodes are “exactly” the same hardware and configuration as the two older nodes. But something odd was going on.
Live migration between Node1 and Node2 continued working normally, while it failed to run any of the new nodes. The event viewer was giving very accurate, but totally useless, information stating that “Virtual Machine live migration did not succeed at the source”, although Microsoft had plenty of space in that text box to write more.
I turned off the machine and moved it to Node4 to see if it could run there, but it did not run. It was trying to run but then failed to Node3, then failed again to finally run at Node2. I thought that the new nodes may not have been setup correctly to host virtual machines, so I created a few test machines on the two new nodes and they all worked perfectly. Furthermore, live migrations between Node3 and Node4 were being performed successfully, but not from the two new Nodes to the old.
So I had a situation where half of my cluster was refusing to work well with the other half, but had no problem working together (a generation gap of few months it seems). Something was clearly different between the new and the old nodes, but what?
After looking for a long time, it turned out that the external network on the old nodes was called “Public – Virtual Network” while on the new nodes it was called “Public Virtual network” – it was missing a dash!
For virtual machines, that means a different network. A virtual machine migrating from host to host needs the same virtual network to connect to, and any slight difference in the naming of the virtual network will prevent the virtual machine from knowing that this is indeed the same network, and so the machine cannot run.
What I learned from this experience came in handy when I needed to rebuild a VMware ESXi host for an SMB. That host was literally hosting their entire infrastructure (Router, IPPBX, Firewall, Domain Controller, Filer Server, etc.).
In a hurry I reconfigured the WAN links as “WAN 1,” “WAN 2,” etc., and started browsing the storage, adding the VMs I found to the ESXi inventory, but the VMs did not see their networks.
For a minute, I thought I had to go over thirty virtual machines, one by one, assigning each virtual machine its correct network. This would have been a waste of critical time, but it was doable. However, I would have had to face the puzzle of matching the internal configuration of the multi-homing virtual machines (like routers and firewalls) with the proper virtual networks. For a router with five interfaces, that would have been something of a challenge.
The solution was pretty simple and straightforward: recreate the virtual networks with their names exactly as they were called before without removing any spaces or adding any dashes. And it worked!
Since this was a single VMware ESXi host site, network labeling has always been mainly used for administrator information. If multiple VMware ESXi hosts were implemented with high availability and vMotion in mind, then consistent virtual network labeling would have been essential for vMotion to work.
I strongly advise any administrator to develop the habit of proper virtual network labeling, and to include the labeling in the documentation. Any good developer knows that uncommented code is useless to others and hard to follow when troubleshooting, even if it’s your own code.
The same principle applies to IT infrastructure and basic virtual networking. Proper labeling helps you, and anyone else that may need to assist you, understand the environment, troubleshoot any issues, modify and expand. But most importantly, proper network labeling is necessary for your virtual environment to run.