One or two of our servers have been a little bit overloaded recently.
They’re going to be replaced with beefier machines, but due to a number of issues I haven’t been able to replace them yet.
Issue #1 – Pre expo, we weren’t allowed to replace anything.
Issue #2 – Post Expo, I’m no longer allowed in the data center!
We’re working on sorting issue #2 out, but in the interim I need to keep the older machines running.
I was previously using Monit to monitor system load.
Monit would be a good solution – it has a web ui, it can stop services if system load goes too high, and generally works when everything else is failed. This is great when things go poopy, but it has one fatal issue.
It doesn’t know how to restart stuff if load is back to normal.
This typically means that something will put the server load into unusability for a sustained period of time (due to lots of visitors), monit will go ooh, apache has gone awol, and stop it.
Unfortunately if its back to normal, monit doesn’t have a way to start it up again, so I need to manually go to the monit page, and start the service. I do get emailed on things like this, but it leads to complaints from the 2 clients that appear to monitor their particular websites more than monit does.
So, I’ve been looking at other solutions.
One such solution is sysfence
While sysfence is severly underdocumented, the examples provided don’t even work!, and appears to be abandoned, it does do the job.
Sysfence is a no bells and no whistles precursor to monit, but it has that killer feature that monit is missing.
So, how do we use sysfence?
apt-get install sysfence
Will install it, but unfortunately no config is installed.
So, start off by creating a /etc/sysfence folder
mkdir /etc/sysfence
cd /etc/sysfence
We’ll need to create a config file for it, so
pico sysfence.conf
My sample sysfence script is below (explanation underneath script)
rule "ApacheStop" {
la1 >= 10.00 or la5 >= 6.0
}
run '/etc/init.d/apache2 stop;'
rule "ApacheStart" {
la1 <=2
}
run once '/etc/init.d/apache2 start;'
rule "warning" { la1 >= 8.00 } run once 'echo "Load High: BACKUP" | mail lawrence@computersolutions.cn'
I'm having issues with apache causing load to rocket, so I've setup some rules as follows:
If load average for 1 minute > 10 (ie server is going bonkers), and load average for the last 5 minutes > 6 then stop apache.
if load average for 1 minute > 8 send me an email.
if load average for 1 minute < 2 then start apache. This will only run one time if load is below 2.
The documentation http://sysfence.sourceforge.net/ goes over how to write a rule. Note that the examples are broken;
eg
if {
la1 >= 8.00
} run once 'echo "SHOW FULL PROCESSLIST" | mysql | mail my@email.com'
Issue? All rules need to have a "rule name" specified.
So a corrected working version would be:
if "some rule" {
la1 >= 8.00
} run once 'echo "SHOW FULL PROCESSLIST" | mysql | mail my@email.com'
Back to our setup..
Now we've setup a ruleset, we need to run it. Calling sysfence /etc/sysfence.conf
Will run it as a daemon.
ps -ef
shows our rulesets running:
root 7260 1 0 05:51 ? 00:00:01 sffetch
root 7261 7260 0 05:51 ? 00:00:00 sfwatch 'warning'
root 7262 7260 0 05:51 ? 00:00:00 sfwatch 'ApacheStop'
root 7263 7260 0 05:51 ? 00:00:00 sfwatch 'ApacheStart'
sffetch is the daemon, and sfwatch are the rules it runs.
As sysfence is quite rudimentary, you'll need to kill it if you change rules.
You'll also need to add it to your startup scripts or create one. I'll be lazy and not go over that right now. If people are interested, add a comment, and I'll put something up.
Sysfence can be downloaded here - http://sysfence.sourceforge.net/ (or via apt-get if on a Debian based OS)
Man page for sysfence below (note examples require adding "rulename" after if... {
or rule ... {
):
NAME
sysfence - system resources guard for Linux
SYNOPSIS
sysfence
<configuration file> [<configuration file> ...]
DESCRIPTION
Sysfence is a resource monitoring tool designed for Linux machines.
While running as daemon it checks resource levels and makes desired
action if some values exceed safety limits.
Sysfence can be used for notifying system administrators when something
goes wrong, stopping services when system performance is dropping too
low and starting them when it's going up again, periodically restarting
memory-leaking processes, dumping system statistics in critical situations.
Sysfence can monitor following resource levels: load average, used and
free memory amount, used and free swap space.
USAGE
Sysfence reads it's configuration from file(s) specified in argument
list. Config files may contain one or more rules describing conditions
and actions to be performed.
Rule has syntax like this:
if {
resource1 > limit1
or
{ resource2 < limit2 and resource3 < limit3 }
}
run once 'command-to-be-run'
The block enclosed within {} brackets describes condition. When it's
result is TRUE, following command is invoked.
The once keyword is optional. If present, the command is executed only
once after condition becomes TRUE. Next execution will take place only
if condition becomes FALSE and then TRUE again. Without once keyword,
command is invoked periodically, after every resource check that gives
TRUE, no matter what was the condition result before.
Command specified right after run keyword is passed to /bin/sh, so it
may contain more than one instruction or even whole script. But be
careful - rule checking is suspended unless command execution has been
completed! (Other rules are unaffected.)
As resources, following ones can be given:
- la1
- - load average during last minute.
- la5
- - load average during last 5 minutes.
- la15
- - load average during last 15 minutes.
- memfree
- - lower limit for free memory amount.
- memused
- - upper limit for memory used by processes.
- swapfree
- - lower limit for free swap space.
- swapused
- - upper limit for swap space in use.
EXAMPLES
Do you have problems with MySQL server choking and freezing whole
system? I do. To find queries that cause problems, you may use:
if {
la1 >= 8.00
} run once 'echo "SHOW FULL PROCESSLIST" | mysql | mail my@email.com'
Of course, that wouldn't prevent your system from being blocked, but
following rule could. MySQL will be restarted if LA for last minute
is over 10.0 or LA for last five minutes is over 6.0.
if { la1 >= 10.00 or la5 >= 6.0 }
run '/etc/rc.d/init.d/mysql stop; sleep 120; /etc/rc.d/init.d/mysql
start'
We may also restart some services that probably have memory leaks and
use lots of swap space if not restarted periodically. Let's assume
that 256MB of used swap is enough to give our Zope server a break.
if {
swapused >= 256M
} run '/etc/rc.d/init.d/zope restart'
We may also alert admins... Notice that you don't have to be r00t:
if {
la15 > 4.0
and
{
swapfree < 64M
or
memfree < 128M
}
} run 'echo "i wish you were here..." | sendsms +48ADMINCELLPHONE'
Using sysfence version 0.7 or later you may give rule a name that will
be used in logs:
rule "high load" { la1 > 3.0 and la15 > 2.0 } log
rule keyword has the same meaning as if. There are also synonymes for
other keywords. Detailed list is included within sysfence package.
You can find an example config file in /usr/share/doc/sysfence/example.conf.
AUTHOR
Sysfence was written by Michal Saban (emes at pld-linux org) and
Mirek Kopertowski (m.kopertowski at post pl)
This manual page was created by Lukasz Jachowicz <honey@debian.org>,
for the Debian project (but may be used by others). It is based on
the http://sysfence.sf.net/ page.
Archives
- November 2024
- November 2019
- October 2019
- August 2019
- April 2019
- February 2017
- September 2016
- June 2016
- May 2016
- September 2015
- August 2015
- June 2015
- April 2015
- December 2014
- October 2014
- September 2014
- July 2014
- June 2014
- April 2014
- October 2013
- July 2013
- May 2013
- April 2013
- March 2013
- January 2013
- December 2012
- October 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- December 2011
- November 2011
- October 2011
- September 2011
- July 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
Categories
- Apple
- Arcade Machines
- Badges
- BMW
- China Related
- Cool Hunting
- Exploits
- Firmware
- Food
- General Talk
- government
- IP Cam
- iPhone
- Lasers
- legislation
- MODx
- MySQL
- notice
- qmail
- requirements
- Reviews
- Service Issues
- Tao Bao
- Technical Mumbo Jumbo
- Things that will get me censored
- Travel
- Uncategorized
- Useful Info