mapSoN is a spam filter that uses a pretty unique approach to keep unsolicited commercial e-mail out of your mailbox. Rather than using a set of configured "bad words", a list of "know spammers", or complicated scoring mechanisms to determine what is spam and what is not, it relies on "known senders" -- or rather "unknown senders".
Every time you receive an e-mail, mapSoN will look-up the sender's e-mail address in a small database file and check whether that address is in there. If it is, the mail is delivered to your mailbox, but if it is not, the e-mail will be stored in a spool directory in your home, using a cryptographic cookie as the filename. Then mapSoN will send a so called request for confirmation to the sender's address, asking him to please confirm his addresses validity by replying and sending the cryptograpic cookie back. When mapSoN receives a mail with such a cookie in it, it will move the corresponding mail from the spool directory to your mailbox and add the sender's address in the mail to the database.
This approach is based on the fact that spammers usually fake the sender address of the spam mail. (In fact, they have to, because sending unsolicited advertisement via e-mail is illegal in most countries.) But because their sender address is invalid, they will never see the request for confirmation, they will never reply, and their spam will sit in that spool file until hell freezes over or an apropriate cron job deletes it. Using this heuristic, mapSoN catches way above 95% of all spam mail I receive.
In order to avoid annoying more "real" people, who are trying to contact you, than necessary, you can import the addresses from your mail archive into the mapSoN database. Furthermore, you can set mapSoN up in a way that will let any mail pass automatically, that is a reply to a mail or a news posting of yours: If you sent someone an e-mail and he replies back, mapSoN won't bother him. It would be pretty inpolite, if it did.
This illustration will probably help not all and only add to the confusion, but I made it, and now I have to include it here -- helpful or not!
You may wonder how effective mapSoN is -- and rightfully so. Of course, being the principal author of this tool, I am biased. But I tried my best to conduct an objective study, determining how many spam mails have been caught, how wany mapSoN did not catch, and how many mails were delayed that weren't spam. The numbers presented below are somewhat skewed, because the logfile I analyzed includes the various test mails I piped into mapSoN in order to test and to debug it, but if at all, they tend to make the result look worse because I did not test the case that a mail comes in and is approved, I tested the case that a mail is deferred.
Anyway, I activated mapSoN for my private mail account on January 11th, 2002. This analysis was made on April 10th the same year, so we have a test period of 90 days -- almost three months. Here are the numbers:
Address Database entries, imported: | 4407 |
Address Database entries, today: | 4606 |
Total number of mails received: | 7632 |
Mails coming from mailing lists: | 2104 |
Number of processed mails: | 1561 |
Number of passed mails: | 395 |
Number of mails deferred: | 998 |
Number of mails confirmed: | 67 |
Number of mails unacknowledged "real" mails: | 14 |
Number of definite junk mails: | 862 |
How did I get those numbers? First of all, I counted all logfile entries coming from sendmail, which contained the strings "mailer=local" and "to=<simons". This I assumed to be the number of total mails received. On those entries, I counted how many of those contained the string "to=<simons+" -- what should largely be the mails I receive via mailing lists because I use procmail's argument feature when I subscribe to mailing lists.
All those mails, addressed to "simons+something" bypassed mapSoN automatically. Furthermore, all mails that contained certain headers like In-Reply-To or References bypassed mapSoN entirely because I assume those to be replies to messages of mine. This explains why the number of mails that mapSoN actually processed is much smaller than the number of total mails received: Apparently only 20% of all incoming e-mail were processed by mapSoN at all!
Then I counted the entries in mapSoN's logfile that sayed "passed", which turned out to be 25% of all mails mapSoN saw. The mails that mapSoN did not let through amount to 63% of all mails processed by mapSoN, and I determined those by counting the logfile entries that said "Spooling e-mail".
Of those deferred e-mails, only 6% were confirmed later! So then I waded through the spool directory and moved all "regular" e-mails to a separate folder and counted them: It turned out that regular 14 mails were not confirmed by the sender, that's 0.14% of all deferred mails and 0.0018% of the total number of mails received.
When I looked at those mails in detail, it turned out that 3 of the 14 mails lying in the spool "unwarrantedly" actually had been acknowledged, but that the other person was too dumb to reply correctly: Two of the three sent a new e-mail, which did of course not contain the cookie, and the third person replied to the request for confirmation, but erased the cookie from the mail manually. It's no wonder that the mails they sent me turned out to be of the kind that I don't want anyway.
The remaining 11 mails that were not delivered to me but were not spam either, were some kind of replies I got from Internet sites like amazon.com, other customer service stuff and automated reminders. No personal e-mail. They were delayed because they were sent with incorrect sender addresses that bounced when mapSoN sent the request for confirmation back or that were apparently not read by the other end at alll.
Once I noticed the problem, I got it fixed quickly by using "user+something" addresses at sites like amazon.com, too, so that their mails bypass mapSoN to begin with.
Of course there was a certain amount of spam that got past mapSoN one way or the other. Some used addresses that actually were in my database because I had imported them from my mail archives when I set mapSoN up. Some others were routed past mapSoN by my procmail configuration because they looked like bounces or postings coming from mailing lists. Unfortunately, I cannot determine any exact number without wading through my mail archive manually, and I honestly don't want to do that. After all, the whole point of writing mapSoN was that I do not have to see spam!
To summarize: In the 90 days of testing, mapSoN caught 862 definite spam mails: That's about 10 per day. It reduced the amount of spam in my mailbox to one or two mails once a week. At the same time, only 0.008 percent of all e-mail I receive had to be acknowledged, what I think is an acceptable level of inconvenience for my communication partners, especially given the fact that obviously they were contacting me, not the other way round.
The most current version of the software can be downloaded from its project page at SourceForge.Net -- whom I wish to thank at this point for kindly hosting this project and for providing a generally excellent service to the software-development community.
Apart from downloading the software, you can of course read the user manual on-line, either in HTML format or in PDF format.
mapSoN has been written for the UNIX operating system and will (hopefully) compile on any POSIX-compliant system. It has been tested on Linux, FreeBSD, Solaris, and MacOS X 10 and built out-of-the-box just fine. Since it has been written in ISO-C++, it will require a fairly recent compiler, though, because many older C++ compilers are not ISO C++ compliant. If you're using the excellent GNU C Compiler, you should not have any problem. If you do, please file a support request at mapSoN's project page at project pages at SourceForge.net.
mapSoN is copyrighted by Peter Simons
<simons@computer.org>
. Permission is granted to use it under the
terms of the GNU General
Public License.