The following was submitted by Sid Mclaurin of American Families Online. They are an ISP in Florida and a very successful user of Scanmail. On average they block over 85% of incoming mail a day. You reach Sid at smclaurin@afo.net ---------------------------------------------------------------------------- As promised, here some the regex rules I've created along with some pratcial tips on using Scanmail. 1. The most successfully way to block spam is by the URL, what I call the "money maker". Spammers make a living by getting fractions of a penny from having unexceptioning mail reciepents clicking on their links. Most of the so called "Unscribe" links go to the same URL as the "Scribe" link. The most effective rule that I've implemented towards spam is by blocking ".biz" in the spam-list.conf and address-list.conf. Occassionally email from yahoo or hotmail has a ".biz" spam footer that gets rejected but it has been totally worth it. ".biz" now account for one quarter of the "Rejected" mail. 2. The next most common "spam sign" is obfuscated words. Normal mail never has such blanant misspellings and regex does a great job of catching this type of spam. The most common spam word for us is "viagra" or some deviation. Here are the regex examples I've implemented to catch this type of spam: v[\,\"\'\:\;\+\=\?]*[i1\|íl][\,\"\'\:\;\+\=\?]*[A\@][\,\"\'\:\;\+\=\?]*[gq] [\,\"\'\:;\+\=\?]*r[\,\"\'\:;\+\=\?]*[A\@] v[\-\(\)\#\`\~\%\$]*[i1\|íl][\-\(\)\#\`\~\%\$]*[A\@][\-\(\)\#\`\~\%\$]*[gq] [\-\(\)\#\`\~\%\$]*r[\-\(\)\#\`\~\%\$]*[A\@] v[\s\.\|\/\^\!\*\_]*[i1\|íl][\s\.\|\/\^\!\*\_]*[A\@][\s\.\|\/\^\!\*\_]*[gq] [\s\.\|\//\^\!\*\_]*r[\s\.\|\//\^\!\*\_]*[A\@] These rules catch the word "viagra" along with common deviation like: v-i@)gr@ v.iagr.@ By have the "*" allows it to catch the nonense sperating charcter if it is or isn't presence. Rarely do I see a uniformed obfuscated word like: v-i-a-g-r-a. I had to break this rule into three seperate rules because I encounter a character limitation. Recently, we've encounter a deviation in the viagra spelling by adding a character, usually between the "g" and "r" like "viagGra". I've use the following regex to counter: vi[i|]*ag[a-z\|]ra 3. Another "spamsign" and DoS attack is to have a large number of constants in the to & from address. The following rules catching this type of spam: [bcghjklmnpqrstvwxz]{8,20}\@ This rule should be tweak per ISP. We have a couple of users that have 7 consective constants so I had to set the rule to "8". The "20" upper boundary is there for legitimate mailing list servers. Notice that there are several constants left out like "d" & "f" because they are common initials. Most spam random generated address have these particular constants. I'm trying to develop a rule that incorportates "y"'s because they are very common. 4. Other obfuscated words are a guarantee "spam sign". I found looking for obfuscated word "girl" (ex. g|rl or g1rl) isn't practical because real users tend get "cute" with their emails. The following words have a 100% spam indication: \bFr0m \bG\@ng \bH\@rd \bP0rn \bPorn0 \bpuss[1\|]es \bdiscreet[1\|]y \bhardc0re \br3move \bremov3 \brem0ve \br3mov3 \br3m0v3 \bappr0ved \bd0se \bt\@ke \bd[1l\|]scount \bb00st \bbo0st \bb0ost \ben[1\|]ar[gq]ement \b0pp0rtunity \bdisc0unt \bvirgiens \bj[1l\|]zz \bd0ng \bh0le \bpuss[|1][e3]s \bvaigra \bmicrrowave \bgeeneric \bgenneric The "\b" keeps it from being identified when the scanning attachments option is used. This list should vary from ISP but these are the most common. I recommend that each ISP study their spam for common obfuscated words. I've notcied that "our spam" has certain characteristics. On a similar note, I've noticed a difference in spam when it is forwarded from another mail server (ex. when a customer switches ISPs and has their mail from their old address forwarded). First because the blacklisted IP are stripped from the header so only thing stopping it is your scanmail config files. After that, you will see different URLs and different types of obfuscation. I highly recommend not allowing this type of forwarding to continue for long because of being what I call "bot-it". 4. "Bot-it" is when a spam robot spams you going down a list of address, generally alphabetical. This extreme prevelant with three or less character emails. As general rule for an ISP, make your customers have atleast six character or numbers in the email address, preferably both. I found it to be effective to block certain email address of your own domain for spam and viruses. james@afo.net & admin@afo.net are blocked because mass mailing viruses. Other old address are blocked because they are "spam catchers". Look for address that start with "a" or "b". 5. These next several rules target the "deceptive" characterisitics of html spam. These rules need to watched and tweaked to keep from catching "ham". Common spam sign is to include a bunch of constant in between html brackets, "< >". I'm not quite sure why, maybe is to confuse SpamAssassin but Scanmail can see it. I catch it with: <[bcghjklmnpqrstvwxz]{5}> If you lower it to "4" you catch the "" tag. Regular mail never has this nonsense. A similar rule is this: <\!\w{10,60}> that catches extremely long commented gibberish. By blocking the "money maker", spammer must resort to other ways of getting there links through. Below are to catch encoded URL's in the html mail: \&\#[0-9]+\;\&\#[0-9]+\; \%[0-9]+\%[0-9]+\% I've had to put a couple of mailing list servers on the exceptions-list.conf to keep these rules in place but they have caught a tremedous amount of spam. I've tried to incorparte the other type of html obfuscation with this rule: \%[0-9]+\%[0-9]+\% but apparently earthlink's webmail interface attaches a footer using this encoding so it should be avoided. Another common spam sign is the use of bogus frame tags. This rule catches them: \w*<\/noframes><\/frame> Even though it is not very common for us, it is always spam. A very common spam sign is to have "white" or "invisible" text in html emails. I'm slowly developing rules to catch this type of spam with catching ham. This rule is great on catching those "near" invisble text by looking for off white variations: color=[\"\']*\#FFFFF[0-9A-E] This rule has only caught spam. 6. A extremely effective way of blocking spam is by blocking their image URL. Blocking "geocites.com" and "terra.es" have been xtremely effective. The URL "serverimages.com" supplies porn images for hundreds of sites. I've used regex to catch spam with changing URLs by catching the path to the images. \/biz\/gb[a-z]*\.gif com\/z[0-9]+\.gif The above rules have caught all of the spam from a porn site with seemingly endless domains and IP address. This type of filtering is extremely effective on those "image only" spams. \.com\/c2\.gif banners\/run[0-9]+.gif The above rules have caught the viagra image only spam. These type of rules need to be created from the spam you are receiving. Even though spam is coming from hundreds of IP, addresses and domains, there is usually something common and unique to them. 7. These next examples target spammers with multiple URLs or address that have common characteristics. The below rule catches spam from a porn site that loves to have "climax" in it's domain names. climax[a-z]*\.com I used to see all types of URLs like: climaxerections.com. climaxhardcore.com, etc. from the same spammer. This rules catching spam from numerated addresses or URLs: redv[0-9]*\.com 8. The next couple of conceptions are really just guidelines. Scanmail's stat file is invaluable in fighting spam. I found it extremely effective to watch the stats file by using "tail -f" to catch spam. Here's an example stat log entry: Rejected because of spam (\&\#[0-9]+\; \&\#[0-9]+\;)~<samrb0bvcjo2vl4rjiq@yahoo.com>~<agape@afo.net>~01/07/04 16:57~2505~12-222-136-53.client.insightBB.com [12.222.136.53]~12.156.195.246~Agape, --> 80%!....gunfight8351ok Even though it was caught by my regex rule, you tell it is spam by the bogus address. Nobody can remember a email like that. More telling is the domain. See that it is from "yahoo.com" but the mail server resolves to "insightBB.com". Also notice the gibberish in the subject. Then notice that is has been relayed, even though I've found this not to be a definitive sign. By having the ability to see this info, I then go to that mailbox and investigate further, looking for URLs and other blockable signs. Once you identified and block spam, investigate further. Blocking spam with address-list.conf & ip-list.conf is more efficient than the spam-list.conf. We have a spam-bot that I call "sendmails.com" because that it the URL in its spam. The messages get rejected because of the URL but I noticed that used certain email address like send-mails.com, myrainmailserver.com, etc. Adding these to address-list.conf has blocked it when it has changed its URL. Well that is about it in a nutshell. I send you more as we grow.