Spam Graphing and Logging for SpamAssassin Rule Optimization
James Mikusi
During my tenure as a systems administrator, I've noticed that admins fall into
two disparate groups based on how they approach a problem. The first group aggressively
works toward a solution and closure to the problem, trying any potential change
that might make the fix. The other group works more methodically, making calculated
adjustments and reversible changes. I've come to appreciate both groups, especially
the former when it's important to just "get the job done", but getting a grip
on spam requires the more deterministic approach. Counting and graphing your
spam, for example, can help you see just how big your problem might be and how
best to attack it.
This article details how to gather statistics on mail that is filtered through
SpamAssassin and how to plot those numbers with MRTG. This project began when
I decided to learn exactly how much spam I received in a given period; it grew
when I found some oddities in the SpamAssassin rules that matched most frequently.
I should add that when I began this project I had already invested considerable
time tuning SpamAssassin's Bayesian database. In my opinion, this remains one
of the strongest defenses against spam on a per-user basis, because what is
spam to you is not necessarily spam to your neighbor. Thus, teaching SpamAssassin
to recognize what's spam to you is important.
On that note, you also should be aware that the implementation described is
designed for a single user. The scripts could easily be edited for use at the
domain level. However, the objectives here are to tune SpamAssassin, which is
difficult to do, and to make global assumptions about what hundreds of users
might concur is spam. The methods described increase the effectiveness of Bayes
filtering by finding out which rules are triggered most often.
|