Support

Blog

One or two of our servers have been a little bit overloaded recently.

They’re going to be replaced with beefier machines, but due to a number of issues I haven’t been able to replace them yet.

Issue #1 – Pre expo, we weren’t allowed to replace anything.
Issue #2 – Post Expo, I’m no longer allowed in the data center!

We’re working on sorting issue #2 out, but in the interim I need to keep the older machines running.

I was previously using Monit to monitor system load.

Monit would be a good solution – it has a web ui, it can stop services if system load goes too high, and generally works when everything else is failed. This is great when things go poopy, but it has one fatal issue.

It doesn’t know how to restart stuff if load is back to normal.
This typically means that something will put the server load into unusability for a sustained period of time (due to lots of visitors), monit will go ooh, apache has gone awol, and stop it.
Unfortunately if its back to normal, monit doesn’t have a way to start it up again, so I need to manually go to the monit page, and start the service. I do get emailed on things like this, but it leads to complaints from the 2 clients that appear to monitor their particular websites more than monit does.

So, I’ve been looking at other solutions.

One such solution is sysfence

While sysfence is severly underdocumented, the examples provided don’t even work!, and appears to be abandoned, it does do the job.
Sysfence is a no bells and no whistles precursor to monit, but it has that killer feature that monit is missing.

So, how do we use sysfence?

apt-get install sysfence

Will install it, but unfortunately no config is installed.

So, start off by creating a /etc/sysfence folder

mkdir /etc/sysfence
cd /etc/sysfence

We’ll need to create a config file for it, so

pico sysfence.conf

My sample sysfence script is below (explanation underneath script)


rule "ApacheStop" {
la1 >= 10.00 or la5 >= 6.0
}
run '/etc/init.d/apache2 stop;'


rule "ApacheStart" {
la1 <=2 } run once '/etc/init.d/apache2 start;'


rule "warning" { la1 >= 8.00 } run once 'echo "Load High: BACKUP" | mail lawrence@computersolutions.cn'

I'm having issues with apache causing load to rocket, so I've setup some rules as follows:

If load average for 1 minute > 10 (ie server is going bonkers), and load average for the last 5 minutes > 6 then stop apache.
if load average for 1 minute > 8 send me an email.
if load average for 1 minute < 2 then start apache. This will only run one time if load is below 2. The documentation http://sysfence.sourceforge.net/ goes over how to write a rule. Note that the examples are broken;

eg
if {
la1 >= 8.00
} run once 'echo "SHOW FULL PROCESSLIST" | mysql | mail my@email.com'

Issue? All rules need to have a "rule name" specified.

So a corrected working version would be:

if "some rule" {
la1 >= 8.00
} run once 'echo "SHOW FULL PROCESSLIST" | mysql | mail my@email.com'

Back to our setup..

Now we've setup a ruleset, we need to run it. Calling sysfence /etc/sysfence.conf

Will run it as a daemon.

ps -ef shows our rulesets running:

root 7260 1 0 05:51 ? 00:00:01 sffetch
root 7261 7260 0 05:51 ? 00:00:00 sfwatch 'warning'
root 7262 7260 0 05:51 ? 00:00:00 sfwatch 'ApacheStop'
root 7263 7260 0 05:51 ? 00:00:00 sfwatch 'ApacheStart'

sffetch is the daemon, and sfwatch are the rules it runs.

As sysfence is quite rudimentary, you'll need to kill it if you change rules.

You'll also need to add it to your startup scripts or create one. I'll be lazy and not go over that right now. If people are interested, add a comment, and I'll put something up.

Sysfence can be downloaded here - http://sysfence.sourceforge.net/ (or via apt-get if on a Debian based OS)


Man page for sysfence below (note examples require adding "rulename" after if... { or rule ... {):

NAME

sysfence - system resources guard for Linux

SYNOPSIS

sysfence
<configuration file> [<configuration file> ...]

DESCRIPTION

Sysfence is a resource monitoring tool designed for Linux machines.
While running as daemon it checks resource levels and makes desired
action if some values exceed safety limits.

Sysfence can be used for notifying system administrators when something
goes wrong, stopping services when system performance is dropping too
low and starting them when it's going up again, periodically restarting
memory-leaking processes, dumping system statistics in critical situations.

Sysfence can monitor following resource levels: load average, used and
free memory amount, used and free swap space.

USAGE

Sysfence reads it's configuration from file(s) specified in argument
list. Config files may contain one or more rules describing conditions
and actions to be performed.

Rule has syntax like this:

if {
resource1 > limit1
or
{ resource2 < limit2 and resource3 < limit3 }
}
run once 'command-to-be-run'

The block enclosed within {} brackets describes condition. When it's
result is TRUE, following command is invoked.

The once keyword is optional. If present, the command is executed only
once after condition becomes TRUE. Next execution will take place only
if condition becomes FALSE and then TRUE again. Without once keyword,
command is invoked periodically, after every resource check that gives
TRUE, no matter what was the condition result before.

Command specified right after run keyword is passed to /bin/sh, so it
may contain more than one instruction or even whole script. But be
careful - rule checking is suspended unless command execution has been
completed! (Other rules are unaffected.)

As resources, following ones can be given:

la1
- load average during last minute.
la5
- load average during last 5 minutes.
la15
- load average during last 15 minutes.
memfree
- lower limit for free memory amount.
memused

- upper limit for memory used by processes.
swapfree
- lower limit for free swap space.
swapused
- upper limit for swap space in use.

EXAMPLES

Do you have problems with MySQL server choking and freezing whole
system? I do. To find queries that cause problems, you may use:

if {
la1 >= 8.00
} run once 'echo "SHOW FULL PROCESSLIST" | mysql | mail my@email.com'

Of course, that wouldn't prevent your system from being blocked, but
following rule could. MySQL will be restarted if LA for last minute
is over 10.0 or LA for last five minutes is over 6.0.

if { la1 >= 10.00 or la5 >= 6.0 }
run '/etc/rc.d/init.d/mysql stop; sleep 120; /etc/rc.d/init.d/mysql
start'

We may also restart some services that probably have memory leaks and
use lots of swap space if not restarted periodically. Let's assume
that 256MB of used swap is enough to give our Zope server a break.

if {
swapused >= 256M
} run '/etc/rc.d/init.d/zope restart'

We may also alert admins... Notice that you don't have to be r00t:

if {
la15 > 4.0
and
{
swapfree < 64M
or
memfree < 128M
}
} run 'echo "i wish you were here..." | sendsms +48ADMINCELLPHONE'

Using sysfence version 0.7 or later you may give rule a name that will
be used in logs:

rule "high load" { la1 > 3.0 and la15 > 2.0 } log

rule keyword has the same meaning as if. There are also synonymes for
other keywords. Detailed list is included within sysfence package.

You can find an example config file in /usr/share/doc/sysfence/example.conf.

AUTHOR

Sysfence was written by Michal Saban (emes at pld-linux org) and
Mirek Kopertowski (m.kopertowski at post pl)

This manual page was created by Lukasz Jachowicz <honey@debian.org>,
for the Debian project (but may be used by others). It is based on
the http://sysfence.sf.net/ page.

Archives

Categories

Tags

PHOTOSTREAM