A few weeks back, I was dealing with a fairly modest quantity of spam. A daily barrage of 15 to 20 unwanted messages would grace my inbox. I managed it manually for a few months but grew tired of reading spam and wasting my time. I decided I needed a filter but I was not willing to lose any legitimate e-mails.
After reading about Bayesian Filtering, I decided that it was time to try a spam filter. My criteria for filtering software was as follows:
- Bayesian: I wanted something that used Bayesian filtering rather than complicated detection algorythms and lists of spam addresses. I already deal with updating virus definitions regularly. No more continuious subscription software please!
- Inexpensive: Money was indeed a factor. If I have to spend lots of money to deal with spam, then the spammers have already won!
- Transparent: I wanted to use my current e-mail program (Eudora) with minimal changes to its configuration. I did not want separate programs in to check and read my e-mail. It also must not make any noticable impact on my system resources.
I spent some time searching for a filter that met my criteria. Eudora offers junk filtering but it would require upgrading to the full version, for a significant fee: $50.00. Plus, it did not specify what kind of filtering it was using. I checked out a few other options (Commercial, Shareware and Freeware) and finally found a candidate which fit my bill: POPFile.
POPFile is a simple proxy which sits between your mail client and your mail server. The interface is completely browser driven which reduces the program size. Configuring Eudora to use it is amazingly simple (minor changes to the POP3 configuration plus creating a filter definition that relocates spam to the Junk box). It does not bother POPFile that I use two separate Eudora databases which check 5 different email addresses on 3 different mail servers. I only use two mail sorting categories or "buckets" ('spam' and 'not_spam') but POPFile places no limit on the number of buckets which you can create. Best of all, it is Free!
If you go this route, there is something that you need to know about this type of filter. When you first install it and set it up, it starts out dumber than a toad. You create buckets, but they are just containers with a name. You don't define what they hold or criteria to filter by.
When my first e-mail came in, it was marked as Unclassified. It happened to be spam, so I went into POPFile and reclassified the message as spam. The next message was a legitimate e-mail but POPFile classified it as spam, because it only knew about spam. So I went into POPFile once again and reclassified it as not_spam.
Two e-mails processed and a whopping 0.0% accuracy. This is completely expected with Bayesian filtering because it learns from the e-mails it processes. By the end of the day, the success rate was over 80%. After 1 week, it has processed 383 e-mails (140 of which were spam) and I have had to reclassify a mere 19 messages. That includes the first two I received. My Classification Accuracy is 95.03%. That should reach 97% after 1000 messages.
If you aren't filtering, why not? The research has already been done for you. All you need to do is download and configure POPFile. (If you do decide to go this route, please let me know via comments how it goes for you!)



