Free Spam Filtering Tactics Using Eudora

By: Adam Lyon -- spamfilter at mydomainname (Disclaimer)

Overview
Spam Filter Details
Notes
Where Improvements Could Be Made
Links
Revision History

Overview

A lot of spam ends up in your inbox due to the nearly infinite permutations of words that spammers screw around with to avoid filters. Thankfully, regexes (regular expressions) handle this problem easily and elegantly. Euroda's built-in filter functionality allows complex regexes to be applied to combat spam. This page will be updated often with new information as my filters continue to evolve. Note that these tactics for getting rid of spam are implemented using only the filter functionality available in the free version of Eudora, not using SpamAssassin, SpamNix, SpamCop, Spamihilator, SpamFilter, Antispam/MAX, SpamButcher or any other server-side or client-side filtering software you have to buy.

My approach to keeping my Eudora inbox spam-free (or at least spam-minimal) begins with a simple whitelist of valid senders. The next step is using the base filter set found at Cecil Williams' excellent Effective Spam Filtering with Eudora site. I've cleaned these up to my liking by adding, removing and combining rules as I saw fit. This is then augmented by tips and ideas I found at the following URL: http://scanmail-software.com/support/afo.html (local copy). Using this as a reference, I've tweaked and created additional filters that catch a LOT of spam.

My filtering process:

Filter senders against a whitelist of family, friends and valid mailing lists
Check for 10 common spam words and obfuscations thereof
Check for some other commonly misspelled words and word combinations only found in spam
Check for sneaky HTML spam tricks (float: right tricks, font-size: 1px, font-size: 0px, near-white text color, etc.)
A bunch of other filters catching various tell-tale signs of spam

Spam Filter Details

Whitelist
This is very easy and very powerful. Simply create nicknames in Eudora for your family, friends, mailing lists and other valid email sources, and then accept all email from those addresses, skipping all rules after this one. In a perfect world, this by itself shoud be enough (after all, if you don't know them it's probably spam), but obviously isn't the ultimate solution for numerous reasons. The only thing that can then get through are viruses and such, however your antivirus software and/or a filters before the whitelist check should catch those. To implement a whitelist filter for all of the nicknames you have made, create the following rule: "Header: From" "intersects address book" "Eudora Nicknames". You can also allow all email from specific domains like the following: "Header: From" "matches regex (case insensitive)" safedomain.com|myworkdomain.com|oldemployer.com

Obfuscated words
I've created a file which I base obfuscated words off of. It includes permutations for each letter that could be replaced (accented letters, 1 for i, etc). In the regex, each letter is separated with [[:punct:][:space:]]* (which catches characters used to space the letters out to fool filters), so words like "v.1a g_r@" are caught. I also check for doubling and tripling of most letters so things like "viaggrra" are caught. Using this technique, I've built filters for 10 commonly used spam words (viagra, vicodin, cialis, xanax, mortgage, refinance, rolex, pharm, diploma and penis... you can easily create your own based on the types of spam you receive). To create these filters, select "Header: Body" "matches regexp (case insensitive)" and input the regex. View the latest file here, which also has my current regexes for this type of filter plus many other regexes I use.

Commonly misspelled words and word combinations
I based this on the list found at http://scanmail-software.com/support/afo.html, and added in many others that I've seen appear numerous times in my spam. View many of my custom spam catching regexes on this page.

Sneaky HTML spam tricks:
Also initially based on a snippet from http://scanmail-software.com/support/afo.html, this checks for HTML purely there to hide text, links to .biz domains and also redirects from sites like MSN or Yahoo (which have redirect scripts that spammers use like this fake address: http://www.msn.com/redir.asp?url=http://SPAM.COM/ ). Also included is a check for the devious float: right that is used to break up words into gibberish, but still display correctly in HTML-enabled email clients.

Other filters
The remaining filters are all from the filter set available at Cecil Williams' Effective Spam Filtering with Eudora site, tweaked around as I saw fit. I stuck the "attatchment converted" filter as the first in my set, with the rest of them underneath the filters described above.

Notes

I've successfully used these filters in Eudora version 5.x and 6.x.

Eudora uses the POSIX implementation of regex, which means you need to use [[:space:]] instead of \s to indicate any whitespace character (space, tab, newline).

Eudora filters have a character limit per rule, right around 250. Using a previous incarnation of my obfuscated word search, I split each word search into two regexes if it went beyond that many characters, with at least half of the word in each rule per filter using an AND. An example of this would be "viag" (with any obfuscation and separators) AND "agra" (with any obfuscation and separators), thereby skirting the character limitation. This is less important for my updated regex, but good to know anyway.

The best way to catch spam is to keep track of what's evading your spam filters. Patterns emerge quite quickly and you can immediately tweak your filters to catch the new spam. I've added quite a few rules based on what I've seen get through. Often times there's no way that the things included in the spam would ever be present in a valid email (misspellings, odd word combinations), so it's easy to base a new rule on it (or augment an existing rule).

If you are getting a valid email that is ending up as spam and you don't know why, you can assign or change the labels on individual filters and narrow it down that way (by re-fetching or manually running filters on the selected message... control shift L [make sure the manual checkbox is selected on all your filters, you may have to do this to all your filters initially]).

I read about an interesting tactic a while ago which basically makes the spammers do the work for you. Simply create a "spam" or "nospam" subdomain (nospam.yourdomain.com) and use that in your email address (billybob@nospam.billybob.com)... the email harvesters and senders see the "nospam" and remove it, thereby invalidating the email address by trying to validate it. (= Amusing, in any case.

Where Improvements Could Be Made

Additional regexes could be created to prevent things like V</blah>I</blah>A</blah>G</blah>R</blah>A, however, in my experience, the vast majority of those type of emails use font-size: 1px or 0px also, so they get caught anyway.
Blacklisting could be added in.
I still get a few spams that come through with no body at all, though someone could technically send an email with a subject and no body (with no sig) and it would hit that same rule, so I haven't created one for that yet.
There are some other places for improvement at the second link above that you may like to implement, such as "8 or more consecutive consonants in the from address".
I currently only search for "refi" instead of the whole "refinance," which could lead to words like "reflect" being caught. This can be easily avoided in a number of ways, I'll leave it up to you to figure out.

Revision History

2006-02-06: Numerous updates, including my spam filter regexes to include many additions I've recently made, such as the float: right trick.
2004-05-21: Update to commonly misspelled words regex, and simplified the sneaky HTML regex.
2004-05-18: While troubleshooting why a particular spam was getting through, I greatly simplified the regex for the commonly obfuscated words to use [[:punct:][:space:]]* instead of explicitly calling out all the different possible spacers. Duh. I also added in checks for repeated letters in the aforementioned filters. Many other small edits (editlets?) throughout the page.
2004-05-17: Update to sneaky HTML regex, looking for CSS declarations including 0.001pt and variations thereof, which I've never seen in legitimate messages.
2004-05-13: Update to commonly misspelled words regex, adding an entry and simplifying and broadening a few others.
2004-05-12: Update to obfuscated words filter (added in { and } as separators), tweaks to commonly misspelled words, removal of trailing slash for .biz, many text edits.
2004-05-11: Update in filter tactics for the 6 obfuscated words filters, addition of a few commonly misspelled words, removed one entry and added one entry under notes.
2004-05-10: Initial publication.

Disclaimer: I am by no means an expert in anything I talk about here (aside from using email for over 10 years), but I am a technical user who grasps the concepts and deals with them daily. This is not meant to be the ultimate reference for spam filtering using Eudora, just a place to get ideas for your own filtering.