Occasionally even in a well maintained system, qmail has issues.
One semi-common issue I get to see, is when a server we send mail to doesn’t timeout. This ties up an outgoing mail slot. Over a period of time, this can lead to issues where the whole outgoing or incoming queue is sitting doing nothing, as every connection is tied up by ‘tarpitted’ connections.
Ideally Qmail should be able to cope with these. There are settings in qmail to control how long a connection takes, and how long it should wait for. These settings are covered in the following files (usually set in /var/qmail/control)
timeoutconnect – how long for qmail to wait on initial outgoing connection before trying another mail server.
timeoutremote – how long to wait before timing out a connected outgoing server.
timeoutsmtpd – how long for qmail to wait before dropping an incoming connection.
In our system, we set these values to:
30 seconds for timeoutconnect
600 seconds for timeoutremote
360 seconds for timeoutsmtpd
In theory timeoutremote should see qmail drop a connection after 10 minutes (600 seconds).
In practice, qmail doesn’t.
Why?
timeoutremote only applies if the connection hasn’t received any data for the timeout period.
It doesn’t apply to the connection time as a whole.
If the remote end sends some data, the timeout is reset again, and it will wait again for the timeoutremote period. If the remote server dribbles back an ACK or similar once every few minutes, then it can keep a connection alive for as long as it wants.
This may not happen very often, but it can happen enough to tie up our connection queue over a period of time. I’ve seen connections go on for as long as days or weeks in practice.
Ideally one should be able to set a proper timeout period in qmail which it adheres to, so that any connection over a certain time period gets killed, or at least set something up in ucspi-tcp, however thats something for another time.
Here is a real world example.
I’ve run my kill zombie script in test mode (see bottom of page for the script)
/var/qmail/bin/kill-qmail-smtpd-zombies --test
**Running in TEST mode**
Running: ps ax -o etime,pid,comm --no-heading | grep qmail-remote | grep ':[0-9][0-9]:' | awk '{print }'
-=-=-=-=-=-=-=-=-=-=-
Found zombies, setting up shotgun.
Killing qmail-remote zombies
kill -9 26707
-=-=-=-=-=-=-=-=-=-=-
Its come up with a connection thats been running longer than an hour. – 26707
I’ll double check to see that its correct
ps ax -o etime,pid,comm | grep 26707
01:39:07 26707 qmail-remote
Yup, qmail-remote has been running for 1hr39minutes on that connection.
Lets check what the connection is
ps -ef | grep 26707
root 2964 17112 0 13:01 pts/2 00:00:00 grep 26707
qmailr 26707 21959 0 11:23 ? 00:00:00 qmail-remote bamboo.sz.js.cn zhangbin@bamboo.sz.js.cn
Hmm, its a known troublesome server bamboo.sz.js.cn.
In fact, its the one that caused me to write this article!
Lets watch whats actually happening in real time.
strace -p 26707
Process 26707 attached - interrupt to quit
read(3,
[wait for a minute or two…]
Still nothing.
Hmm, sitting there waiting for a response to a read. Guess what happens before the timeout period?
Yup, we receive some more characters just in time to keep the connection up and running…
We could set the timeoutremote to a lower number, but we do actually have cases where servers genuinely are slow on responses for various spam testing reasons (although they usually pickup speed once they pass those tests), so I prefer another method.
Whats my current (lazy in lieu of patching qmail or ucspi-tcp) solution for this?
A culling the zombies script!
To install in your qmail/bin folder, do the following:
cd /var/qmail/bin
wget http://www.computersolutions.cn/blog/wp-content/uploads/2010/02/kill-qmail-zombies.txt
mv kill-qmail-zombies.txt kill-qmail-zombies.sh
chmod 0700 kill-qmail-zombies.sh
The script has a help file built in, parameters are:
./kill-qmail-zombies.sh
--test - Run in test mode (zombie friendly)
--help - Show the help
--force - Kill some zombies!
eg
./kill-qmail-zombies.sh --test
You could set this to run every few hours in a cron script, but I strongly suggest you test first to see if it works correctly. See the help file for more info on that.
Script below for those who want to take a look. Its one of my first shell scripts, so feel free to laugh, and comment accordingly!
#!/bin/sh
# ===========================
# qmail zombie killer script
# Version: 1.0
# Author: L. Sheed
# Company: Computer Solutions
# URL: http://www.computersolutions.cn
# ===========================
PATH=/usr/bin:/bin
function short_usage
{
cat <<- _EOF_
$0: missing parameter
Try '$0 --help' for more information.
_EOF_
}
function usage
{
cat <<- _EOF_
Parameters:
--force kill qmail-smtpd and qmail-send processes (aka zombies) older than 1 hour
--test do a test run (no zombie processes will be harmed)
--help show this help page
Notes:
Strongly suggest test first to see if the ps line works correct on your system before killing any processes!
eg - Run the ps below on your system, and see if the output looks similar
ps ax -o etime,pid,comm --no-heading | grep qmail-smtp
04:40 6468 qmail-smtpd
01:47 7473 qmail-smtpd
01:00 8142 qmail-smtpd
01:00 8143 qmail-smtpd
00:46 8235 qmail-smtpd
00:36 8283 qmail-smtpd
00:19 8391 qmail-smtpd
00:11 8445 qmail-smtpd
00:07 8494 qmail-smtpd
_EOF_
}
function zap_the_bastards
{
PLIST=`ps ax -o etime,pid,comm --no-heading | grep $WHAT | grep ':[0-9][0-9]:' | awk '{print $2}'`
#In test mode, show what would be called also
if [ "$test" = "1" ]; then
echo "Running: ps ax -o etime,pid,comm --no-heading | grep $WHAT | grep ':[0-9][0-9]:' | awk '{print $2}'"
fi
if [ -n "${PLIST:-}" ]
then
echo "-=-=-=-=-=-=-=-=-=-=-"
echo "Found zombies, setting up shotgun."
echo "Killing $WHAT zombies"
for p in $PLIST
do
if [ "$force" = "1" ]; then
echo "Kabooom:"
kill -9 $p
fi
echo "kill -9 $p"
done
echo "-=-=-=-=-=-=-=-=-=-=-"
else
echo "Good news everybody. No $WHAT zombies found."
fi
}
## Main
#parse our parameters
if [ ! $# == 1 ]; then
short_usage
exit
fi
while [ "$1" != "" ]; do
case $1 in
--force )
echo "**Running in FORCE mode**"
force=1
;;
--help )
usage
exit
;;
--test )
echo "**Running in TEST mode**"
test=1
;;
esac
shift
done
#do the deed
targets=( "qmail-remote" "qmail-smtpd" )
for target in ${targets[@]}
do
WHAT=$target
zap_the_bastards
done
Archives
- November 2024
- November 2019
- October 2019
- August 2019
- April 2019
- February 2017
- September 2016
- June 2016
- May 2016
- September 2015
- August 2015
- June 2015
- April 2015
- December 2014
- October 2014
- September 2014
- July 2014
- June 2014
- April 2014
- October 2013
- July 2013
- May 2013
- April 2013
- March 2013
- January 2013
- December 2012
- October 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- December 2011
- November 2011
- October 2011
- September 2011
- July 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
Categories
- Apple
- Arcade Machines
- Badges
- BMW
- China Related
- Cool Hunting
- Exploits
- Firmware
- Food
- General Talk
- government
- IP Cam
- iPhone
- Lasers
- legislation
- MODx
- MySQL
- notice
- qmail
- requirements
- Reviews
- Service Issues
- Tao Bao
- Technical Mumbo Jumbo
- Things that will get me censored
- Travel
- Uncategorized
- Useful Info