Jan
12
2016

XenServer Root Disk Maintenance – Out Of Space

Speech at FATEC – Mauro Cesar Fileto – May 2015.

For all that it does, XenServer has a tiny installation footprint: 1.2 GB (roughly).  That is the modern day equivalent of a 1.44″ disk, really.  While the installation footprint is tiny, well, so is the “root/boot” partition that the XenServer installer creates: 4GB in size – no more, no less, and don’t alter it!

The same is also true – during the install process – for the secondary partition that XenServer uses for upgrades and backups:

XEN-Partition - Mauro C. Fileto

The point is that this amount of space does not facilitate much room for log retention, patch files, and other content. As such, it is highly important to tune, monitor, and perform clean-up operations on a periodic basis. Without attention over time all hotfix files, syslog files, temporary log files, and other forms of data can accumulate until the point with which the root disk will become full.

UPDATE: If you are wondering where the swap partition is, wonder no more. For XenServer, swap is file-based and is instantiated during the boot process of XenServer. As for the 4GB partitions, never alter the size of these partitions upgrades, etc will re-align the partitions to match upstream XenServer release specifications.

One does not want a XenServer (or any server for that matter) to have a full root disk as this will lead to a full stop of processes as well as virtualization for the full disk will go “read only”. Common symptoms are:
•VMs appear to be running, but one cannot manage a XenServer host with XenCenter
•One can ping the XenServer host, but cannot SSH into it
•If one can SSH into the box, one cannot write or create files: “read only file system” is reported
•xsconsole can be used, but it returns errors when “actions” are selected

So, while there is a basis for a problem, the following article offers the basis for a solution (with emphasis on regular administration).

Monitoring the Root Disk

Shifting into the first person, I am often asked how I monitor my XenServer root disks.  In short, I utilize tools that are built into XenServer along with my own “Administrative Scripts”.  The most basic way to see how much space is available on a XenServer’s root disk is to execute the following:

 

df -h

This command will show you “disk file systems” and the “-h” means “human readable”, ie Gigs, Megs, etc.  The output should resemble the following and I have made the line we care about in bold font:

Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             4.0G  1.9G  1.9G  51% /
none                  299M   28K  299M   1% /dev/shm
/opt/xensource/packages/iso/XenCenter.iso
56M   56M     0 100% /var/xen/xc-install

 

A more “get to the point” way is to run:

df -h | grep “/$” | head -n 1

Which produces the line we are concerned with:

/dev/sda1             4.0G  1.9G  1.9G  51% /

The end result is that we know 51% of the root partition is used.  Not bad, really.  Still, I am a huge fan of automation and will now discuss a simple way that this task can be ran – automatically – for each of your XenServers.

What I am providing is essentially a simple BASH script that checks a XenServer’s local disk.  If the local disk use exceeds a threshold (which you can change), it will send an alert to XenCenter so the the tactics described further in this document can be employed for the assurance of as much free space as possible.

Using nano or VI, create a file in the /root/ (root’s home) directory called “diskmonitor” and paste in the following content:

#!/bin/bash
# Quick And Dirty Disk Monitoring Utility
# Get this host’s UUID
thisUUID=`xe host-list name-label=$HOSTNAME params=uuid –minimal`
# Threshold of disk usage to report on
threshold=75    # an example of how much disk can be used before alerting
# Get disk usage
diskUsage=`df -h | grep “/$” | head -n 1 | awk {‘ print $5 ‘} | sed -n -e “s/%//p”`
# Check
if [ $diskUsage -gt $threshold ]; then
     xe message-create host-uuid=$thisUUID name=”ROOT DISK USAGE” body=”Disk space use has exceeded $diskUsage on `echo $HOSTNAME`!” priority=”1″
fi 

After saving this file be sure to make it executable:

chmod +x /root/diskmonitor

The “#!/bin/bash” at the start of this script now becomes imperative as it tells the user space (when called upon) to use the BASH interpreter.

UPDATE: To execute this script manually, one can execute the following command if in the same directory as this script:

./diskmonitor

This convention is used so that scripts can be execute just as if they were a binary/compiled piece of code.  If the “./” prefix is an annoyance, move /root/diskmonitor to /sbin/ — this will ensure that one can execute diskmonitor without the “dot forward-slash” prefix while in other directories:

mv /root/diskmonitor /sbin/

# Now you should be able to execute diskmonitor from anywhere

diskmonitor

 

If you move the diskmonitor script make note of where you placed it as this directory will be needed for the cron entry.

For automation of the diskmonitor script one can now leverage cron: adding an entry to root’s “crontab” and specify a recurring time diskmonitor should be executed (behind the scenes).

The following is a basic outline as how to leverage cron so that diskmonitor will be executed four times per day.  Now, if you are looking for more information regarding cron, what it does, and how to configure it for other automation-based task then visit http://www.thegeekstuff.com/2009/06/15-practical-crontab-examples/ for more detailed examples and explanations.

1.  From the XenServer host command-line execute the following to add an entry to crontab for root:

crontab -e

2.  This will open root’s crontab in VI or nano (text editors) where one will want to add one of the following lines based on where diskmonitor has been moved to or if it is still located in the /root/ directory:

# If diskmonitor is still located in /root/
00 00,06,12,18 * * * ./root/diskmonitor
# OR if it has been moved to the /sbin/ directory
00 00,06,12,18 * * * diskmonitor

3.  After saving this, we now have a cron entry that runs diskmonitor at midnight, six in the morning, noon, and 6 in the evening (military time) for every day of every week of every month.  If the script detects that the root drive on a XenServer is > 75% “used” (you can adjust this), it will send an alert to XenCenter where one can leverage – further – built in tools for email notifications, etc.

The following is an example of the output of diskmonitor, but it is apropos to note that the following test was done using a threshold of 50% — yes, in Creedence there is a bit more free space!  Kudos to Dev!

XEN-Partition - Mauro C. Fileto

One can expand upon the script (and XenCenter), but lets focus on a few areas where root disk usage can be slowly consumed.

Removing Old Hotfixes

After applying one or more hotfixes to XenServer, copies of each decompressed hotfix are stored in /var/patch.  The main reason for this – in short – is that in pooled environments, hotfixes are distributed from a host master to each host slave to eliminate the need to repetitively download one hotfix multiplied by the number of hosts in a pool.

The more complex reason is for consistency, for if a host becomes the master of the pool, it must reflect the same content and configuration as its predecessor did and this includes hotfixes.

The following is an example of what the /var/patch/ directory can look like after the application of one or more hotfixes:

XEN-Partition - Mauro C. Fileto

Notice the /applied sub-directory?  We never want to remove that. 

APPROPRIATE REMOVAL:

To appropriately remove these patch files, one can should utilize the “xe patch-destroy” command.  While I do not have a “clever” command-line example to take care of all files at once, the following should be ran against each file that has a UUID-based naming convention:

cd /var/patch/

xe patch-destroy uuid=<FILENAME, SUCH AS 4d2caa35-4771-ea0e-0876-080772a3c4a7>
(repeat "xe patch-destroy uuid=" command for each file with the UUID convention)

While this is not optimum, especially to run per-host in a pool, it is the prescribed method and as I have a more automated/controlled solution, I will naturally document it.

EMERGENCY SITUATIONS:

In the event that removal of other contents discussed in this article does not resolve a full root disk issue, the following can be used to remove these patch files.  However, it must be emphasized that a situation could arise wherein the lack of these files will require a re-download and install of said patches:

find /var/patch -maxdepth 1 | grep "[0-9]" | xargs rm -f

Finally, if you are in the middle of applying hotfixes do not perform the removal procedure (above) until all hosts are rebooted, fully patched, and verified as in working order.  This applies for pools – especially – where a missing patch file could throw off XenCenter’s perspective of what hotfixes have yet to be installed and for which host.

The /tmp Directory

Plain and simple, the /tmp directory is truly meant for just that: holding temporary data.  Pre-Creedence, one can access a XenServer’s command-line and execute the following to see a quantity of “.log” files:

cd /tmp

ls

XEN-Partition - Mauro C. Fileto

As visualized (and overtime) one can see that an accumulation of many, many log files.  Albeit, these are small at the individual file perspective, but collectively… they take up space.

Thanks to the Citrix Community fr the information, these logs were always intended to be “removed” automatically once a Guest VM was started.  So, as of 6.5 and beyond — this section is irrelevant!

cd /tmp/
rm -rf *.log

This will remove only “.log” files so any driver ISO images stored in /tmp (or elsewhere) should be manually addressed.

Compressed Syslog Files

The last item is to remove all compressed Syslog files stored under /var/log.  These usually consume the most disk space and as such, I will be authoring an article shortly to explain how one can tune logrotate and even forward these messages to a Syslog aggregator.

UPDATE:  As a word of of advice, we are only looking to clear “*.gz” (compressed/archived) log files.  Once these are deleted, they are gone.  Naturally this means an server status report gathered for collection will lack historical information so one may consider copying these off to another host (using scp or WinSCP) before following the next steps to remove them under a full root disk scenario.

In the meantime, just as before one can execute the following command to keep current syslog files in-tact, but remove old, compressed log files:

cd /var/log/
rm -rf *gz

So For Now…

It is at this point one has a tool to know when a disk has hit capacity and methods with which to clean-up specific items.  This can be taken by the admin to be ran in an automated fashion or manual fashion.  It is truly up to the admin’s style of work.

About the Author: Mauro C. Fileto

Comments are closed.

Translate »

Enjoy this blog? Please spread the word :)

RSS
Facebook
Google+
http://www.cesarfileto.com/blog/2016/01/12/xenserver-root-disk-maintenance-out-of-space/">
Twitter