T-SQL crash course for sysadmins

Virtual computing

I’ve previously talked about my upgrade experience from MCTS to MCSA: SQL Server 2012, and about T-SQL querying becoming part of the requirements. Personally, while preparing for my exams I had my doubts that a virtualization consultant like me, or any sysadmin, would find T-SQL a useful skill. Still, I enjoyed being able to understand and manipulate data inside a database while watching TrainSginal’s SQL Server 2012 Querying (70-461) course.

Yet as with many other skills in IT, practice makes perfect and if you do not practice often, you may soon forget it.

Little did I know that I had to use what I was learning much sooner than the exam dates. I got a support call about a vCenter server’s service failing to start with an error message in the event viewer indicating its SQL Express database reached the maximum licensed size limit!

VMware vCenter Server comes bundled with Microsoft SQL Server express which is targeted for smaller-scale vCenter installations of no more than 5 hosts and 50 virtual machines (VMs). This specific environment was not bigger than that, and there was no excessive logging enabled to explain hitting the maximum size (4 GB supported by SQL Express 2005) in such a short period.

How do I usually deal with such a situation?

The easiest thing to do is to truncate the VPX_HIST_STAT and corresponding VPX_SAMPLE_TIME tables to free some space using the following commands:

truncate table VPX_HIST_STAT1;  truncate table VPX_HIST_STAT2;  truncate table VPX_HIST_STAT3;  truncate table VPX_HIST_STAT4;    truncate table VPX_SAMPLE_TIME1;  truncate table VPX_SAMPLE_TIME2;  truncate table VPX_SAMPLE_TIME3;    truncate table VPX_TEMPTABLE0;  truncate table VPX_TEMPTABLE1;  truncate table VPX_TEMPTABLE2;  

After that, and assuming you saved enough space on your database to get it operational again to allow your vCenter server to use it, you may want to take some measures to make sure that your vCenter database does not reach its maxim size again. One way to do this is to change the retention policy from your vCenter Server Settings to keep only the last 180 days or so of performance and events logs.

But if it so happens that your database is full enough that you have to purge old data manually to allow it to run, VMware KB 1025914 explains how to do that using SQL management studio.

I thought that this would be less prone to error and much faster if performed using T-SQL. Since I needed to practice what I learned, I wrote the following simple queries to display the current retention policy values:

USE VIM_VCDB;    SELECT [VALUE] AS 'event.maxAge'  FROM dbo.VPX_PARAMETER  WHERE NAME= 'event.maxAge';    SELECT [VALUE] AS 'event.maxAgeEnabled'  FROM dbo.VPX_PARAMETER  WHERE NAME= 'event.maxAgeEnabled';    SELECT [VALUE] AS 'task.maxAge'  FROM dbo.VPX_PARAMETER  WHERE NAME= 'task.maxAge';    SELECT [VALUE] AS 'task.maxAgeEnabled'  FROM dbo.VPX_PARAMETER  WHERE NAME= 'task.maxAgeEnabled';  

And the following queries to set the needed retention policy instead of editing the table manually:

USE VIM_VCDB;    UPDATE dbo.VPX_PARAMETER  SET [VALUE] = '180'   WHERE NAME= 'event.maxAge';    UPDATE dbo.VPX_PARAMETER  SET [VALUE] = 'true'   WHERE NAME= 'event.maxAgeEnabled';    UPDATE dbo.VPX_PARAMETER  SET [VALUE] = '180'   WHERE NAME= 'task.maxAge';    UPDATE dbo.VPX_PARAMETER  SET [VALUE] = 'true'   WHERE NAME= 'task.maxAgeEnabled';  

After setting the retention policy, you need to purge data from the vpx_event, vpx_event_arg, and vpx_task tables based on the amount dates specified for maxAge. By running a stored procedure called cleanup_events_tasks_proc.

What if I purged historical data older than 30 days and I still have a space problem?

This is exactly the new situation I had on hand that forced me to understand what was making those tables grow so big, so fast. The first thing I needed to know was what tables were taking up so much space. That would be very easy if I had the new “Object Explorer details” feature in SQL Server 2012.

Object Explorer details

Object Explorer details

Since I was on SQL Express 2005, I had to use the stored procedure sp_spaceused against the three suspect tables:

exec sp_spaceused vpx_event;  exec sp_spaceused vpx_event_arg;  exec sp_spaceused vpx_task;  

It turned out that vpx_event_arg was the biggest table; however data in that table did not make much sense to me because multiple rows were related to events in vpx_event table. So I used the following query to figure out what type of events where filling my database:

SELECT TOP 10 COUNT(*) AS [Count], EVENT_TYPE AS [Type]   FROM vpx_event  GROUP BY EVENT_TYPE  ORDER By COUNT DESC;  

And soon enough I learned that I had a huge number of esx.problem.vmfs.heartbeat.timedout. So the next question I asked naturally was, “did I have a datastore that was timing-out much more than the rest?”

To figure that out I typed the following:

SELECT TOP 5 COUNT(*) AS [Count], DATASTORE_NAME AS [DataStore]   FROM vpx_event  WHERE EVENT_TYPE = 'esx.problem.vmfs.heartbeat.timedout'  GROUP BY DATASTORE_NAME  ORDER By COUNT DESC;  

To my surprise there were around a million timeout events caused by an old storage called ‘ExtArray 1-1′. Although the ExtArray have not being used for a long time, nobody bothered to remove it from vCenter and it was sitting there filling the database with useless events.

At this stage I had successfully troubleshot the problem using my newly acquired T-SQL skills, and could have deleted all event data from the database to get things going, but I wanted to practice a bit more advanced T-SQL to try to save useful data from being purged.

Up until a couple of weeks ago I used to turn my face the other way when I saw any kind of joins in a query, as I had no wish to learn anything related to it. But after watching TrainSignal’s SQL Server 2012 Querying (70-461) , I knew that INNER JOIN between the vpx_event table and its arguments in the vpx_event_arg table is exactly what I needed to do to understand what data can surgically be removed out of the tables without affecting other useful data.

SELECT EVENT_ID  FROM vpx_event  WHERE DATASTORE_NAME = 'ExtArray 1-1'  ORDER BY EVENT_ID DESC;    SELECT DISTINCT vpx_event_arg.EVENT_ID  FROM vpx_event_arg  INNER JOIN vpx_event  ON vpx_event_arg.EVENT_ID = vpx_event.EVENT_ID AND DATASTORE_NAME = 'ExtArray 1-1'  ORDER BY EVENT_ID DESC;  

So I restored the database from a backup I made before my earlier attempt to sacrifice everything up to the last 30 days and ran the following quires to delete anything related to DataStore ‘ExtArray 1-1′:

DELETE vpx_event_arg  FROM vpx_event_arg  INNER JOIN vpx_event  ON vpx_event_arg.EVENT_ID = vpx_event.EVENT_ID AND DATASTORE_NAME = 'ExtArray 1-1';    DELETE  FROM vpx_event  WHERE DATASTORE_NAME = 'ExtArray 1-1';  

Yet that did not save as much space as I had hoped, and my tables where still very big although I had removed millions of entries from them and shrank the database. So what was I missing? After some thought I remembered that tables usually have indices attached to them and those indices may still contain a lot of data about data (metadata) even though the data may no longer be there. So I deiced to rebuild indices on those tables using the following queries:

ALTER INDEX PK_VPX_EVENT_ARG ON vpx_event_arg REBUILD;  ALTER INDEX PK_VPX_EVENT ON vpx_event REBUILD;  

If all you have is a hammer, everything looks like a nail

In the end, I got the database online with a minimum amount of data loss, and troubleshot a problem in the virtual infrastructure. The problem otherwise would be very time-consuming to identify with traditional methods, like exploring every object in the vCenter GUI looking for an abnormal number of like events. Actually, that wouldn’t have even been possible without getting the vCenter database up, and for that reason most specialized third-party tools would not have helped me in this case. Also, setting up such a third-party tool at that point would have been a huge waste of downtime and would have sounded illogical.

I hope that you agree that T-SQL is a powerful tool that admins need to master when supporting and administrating a system that relies on a database backend. You may think writing queries is a DBA or the developer task, but those guys do not understand your infrastructure better than you do. The database is just a representation of your infrastructure, so if you can handle it you will be in a better position as an admin. In this case I applied my skills to the VMware vCenter database, but I’m sure you’ll find it beneficial to get to know your databases whether you’re responsible for System Center, SharePoint, AD RM, Kaspersky, or any other type of administration.

TrainSignal trial

Comments