If you are using VMWare ESX or ESXi in a SAN or NAS environment you might have heard your storage vendor talking about the importance of disk alignment. I do not want to go into too much detail about disk alignment and what it is in exact detail as this has been covered over and over again. A great resource to read about the importance of disk alignment can be found here at Netapp.com’s website and blog.
In one of my VMWare environments we deployed a Netapp filer V-Series 3140. When we deployed the Netapp, life was good (performance, speed, reliability, etc.). We knew about the need to re-align certain VMs and we did plan to do so later on during the year to avoid too much interruption of business processes. Things never turn out the way you expect them to and so the year went by and the alignment project never materialized for several reasons. By the way – this is not written to blame anyone, it’s just how things turned out to be.
In November we added a large file and print server to use the Netapp storage and that’s when things started going wrong. We had not seen performance issues on the Netapp until then, but we saw them after we populated the disk storage with roughly 10 TB of data. Unfortunately this particular Netapp filer was configured in a way that “helped” to make things worse and again, I am not saying this to blame anyone. The business had a certain budget to buy this filer back then and the purchase made, reflected this. Anyway, the second controller of this V3140 was hammered and it became clear that even though there was plenty of disk available, the NVRAM was undersized. Also, the fact that fast SAS disk shelves and SATA disk shelves were sitting on that controller did not help either as this thing “dumbs” itself down to the slower available speed of the SATA disks. So, we saw disk queuing and it was not pretty.
A discussion with the reseller and Netapp lead to a plan of action which included some short-term fixes as well as it included the purchase of an additional Netapp filer. The additional filer was the reason we talked to Netapp in the first place and it is in our budget anyway, but we did not expect to talk shop so fast.
Since we own licenses for Quest vOptimizer we decided to use this tool to re-align VMs. The beauty of this tool is that it is supposed to be fail-safe by generating additional backups during the alignment process and that you can schedule the actual execution of the alignment job. This would help us to address these issues in a faster fashion. The first 8 re-alignments went well – slower than expected, but they worked. Re-alignment #9 failed several times (I am still looking for a fix), but it recovered gracefully. Re-alignment #10 turned out to be a disaster. vOptimizer literally destroyed the VM and left in an unrecoverable state. I had to restore it from a vRanger backup and fortunately there were no data changes between the backup and the destruction of the VM as this particular VM was only holding historical data. I currently have a ticket open with Quest Software and hope to have this resolved fairly quickly.
It will take some time to get enough downtime scheduled for all the VMs that need to be aligned and so there is no super-short-term fix for the performance issues that we are seeing. Eventually we will purchase one additional shelf of SATA disk to add more spindles to the SATA aggregate and to improve performance that way. Anyway, in our case the growth was faster than expected – not in a way of amount of storage being used, but by how fast we were actually filling up the disk on this Netapp filer. Definitely something we need to look at more pro-actively in the future. We are working with Netapp on collecting more data so that we can size the new Netapp filer appropriately.