The following was submitted by Sid Mclaurin of American Families Online.  They
are an ISP in Florida and a very successful user of Scanmail.  On average
they block over 85% of incoming mail a day.  You reach Sid at smclaurin@afo.net

----------------------------------------------------------------------------
As promised, here some the regex rules I've created along with some pratcial 
tips on using Scanmail. 
 
1. The most successfully way to block spam is by the URL, what I call the 
"money maker". Spammers make a living by getting fractions of a penny from 
having unexceptioning mail reciepents clicking on their links. Most of the so 
called "Unscribe" links go to the same URL as the "Scribe" link. The most 
effective rule that I've implemented towards spam is by blocking ".biz" in 
the spam-list.conf and address-list.conf. Occassionally email from yahoo or 
hotmail has a ".biz" spam footer that gets rejected but it has been totally 
worth it. ".biz" now account for one quarter of the "Rejected" mail. 
 
2. The next most common "spam sign" is obfuscated words. Normal mail never 
has such blanant misspellings and regex does a great job of catching this 
type of spam. The most common spam word for us is "viagra" or some deviation. 
Here are the regex examples I've implemented to catch this type of spam: 
 
v[\,\"\'\:\;\+\=\?]*[i1\|íl][\,\"\'\:\;\+\=\?]*[A\@][\,\"\'\:\;\+\=\?]*[gq]
[\,\"\'\:;\+\=\?]*r[\,\"\'\:;\+\=\?]*[A\@] 
 
v[\-\(\)\#\`\~\%\$]*[i1\|íl][\-\(\)\#\`\~\%\$]*[A\@][\-\(\)\#\`\~\%\$]*[gq]
[\-\(\)\#\`\~\%\$]*r[\-\(\)\#\`\~\%\$]*[A\@] 
 
v[\s\.\|\/\^\!\*\_]*[i1\|íl][\s\.\|\/\^\!\*\_]*[A\@][\s\.\|\/\^\!\*\_]*[gq]
[\s\.\|\//\^\!\*\_]*r[\s\.\|\//\^\!\*\_]*[A\@] 
 
These rules catch the word "viagra" along with common deviation like: 
 
v-i@)gr@ 
v.iagr.@ 
 
By have the "*" allows it to catch the nonense sperating charcter if it is or 
isn't presence. Rarely do I see a uniformed obfuscated word like: 
v-i-a-g-r-a. 
 
I had to break this rule into three seperate rules because I encounter a 
character limitation. 
 
Recently, we've encounter a deviation in the viagra spelling by adding a 
character, usually between the "g" and "r" like "viagGra". I've use the 
following regex to counter: 
 
vi[i|]*ag[a-z\|]ra 
 
3. Another "spamsign" and DoS attack is to have a large number of constants 
in the to & from address. The following rules catching this type of spam: 
 
[bcghjklmnpqrstvwxz]{8,20}\@ 
 
This rule should be tweak per ISP. We have a couple of users that have 7 
consective constants so I had to set the rule to "8". The "20" upper boundary 
is there for legitimate mailing list servers. Notice that there are several 
constants left out like "d" & "f" because they are common initials. Most spam 
random generated address have these particular constants. I'm trying to 
develop a rule that incorportates "y"'s because they are very common. 
 
4. Other obfuscated words are a guarantee "spam sign". I found looking for 
obfuscated word "girl" (ex. g|rl or g1rl) isn't practical because real users 
tend get "cute" with their emails. The following words have a 100% spam 
indication: 
 
\bFr0m 
\bG\@ng 
\bH\@rd 
\bP0rn 
\bPorn0 
\bpuss[1\|]es 
\bdiscreet[1\|]y 
\bhardc0re 
\br3move 
\bremov3 
\brem0ve 
\br3mov3 
\br3m0v3 
\bappr0ved 
\bd0se 
\bt\@ke 
\bd[1l\|]scount 
\bb00st 
\bbo0st 
\bb0ost 
\ben[1\|]ar[gq]ement 
\b0pp0rtunity 
\bdisc0unt 
\bvirgiens 
\bj[1l\|]zz 
\bd0ng 
\bh0le 
\bpuss[|1][e3]s 
\bvaigra 
\bmicrrowave 
\bgeeneric 
\bgenneric 
 
The "\b" keeps it from being identified when the scanning attachments option 
is used. This list should vary from ISP but these are the most common. I 
recommend that each ISP study their spam for common obfuscated words. I've 
notcied that "our spam" has certain characteristics. On a similar note, I've 
noticed a difference in spam when it is forwarded from another mail server 
(ex. when a customer switches ISPs and has their mail from their old address 
forwarded). First because the blacklisted IP are stripped from the header so 
only thing stopping it is your scanmail config files. After that, you will 
see different URLs and different types of obfuscation. I highly recommend not 
allowing this type of forwarding to continue for long because of being what I 
call "bot-it". 
 
4. "Bot-it" is when a spam robot spams you going down a list of address, 
generally alphabetical. This extreme prevelant with three or less character 
emails. As general rule for an ISP, make your customers have atleast six 
character or numbers in the email address, preferably both. I found it to be 
effective to block certain email address of your own domain for spam and 
viruses. james@afo.net & admin@afo.net are blocked because mass mailing 
viruses. Other old address are blocked because they are "spam catchers". Look 
for address that start with "a" or "b".  
 
5. These next several rules target the "deceptive" characterisitics of html 
spam. These rules need to watched and tweaked to keep from catching "ham". 
Common spam sign is to include a bunch of constant in between html brackets, 
"< >". I'm not quite sure why, maybe is to confuse SpamAssassin but Scanmail 
can see it. I catch it with: 
 
<[bcghjklmnpqrstvwxz]{5}> 
 
If you lower it to "4" you catch the "<html>" tag. Regular mail never has 
this nonsense. A similar rule is this: 
 
<\!\w{10,60}> 
 
that catches extremely long commented gibberish. 
 
By blocking the "money maker", spammer must resort to other ways of getting 
there links through. Below are to catch encoded URL's in the html mail: 
 
\&\#[0-9]+\;\&\#[0-9]+\; 
\%[0-9]+\%[0-9]+\% 
 
I've had to put a couple of mailing list servers on the exceptions-list.conf 
to keep these rules in place but they have caught a tremedous amount of spam. 
I've tried to incorparte the other type of html obfuscation with this rule: 
 
\%[0-9]+\%[0-9]+\% 
 
but apparently earthlink's webmail interface attaches a footer using this 
encoding so it should be avoided. 
 
Another common spam sign is the use of bogus frame tags. This rule catches 
them: 
 
<frame><noframes>\w*<\/noframes><\/frame> 
 
Even though it is not very common for us, it is always spam. 
 
A very common spam sign is to have "white" or "invisible" text in html 
emails. I'm slowly developing rules to catch this type of spam with catching 
ham. This rule is great on catching those "near" invisble text by looking for 
off white variations: 
 
color=[\"\']*\#FFFFF[0-9A-E] 
 
This rule has only caught spam. 
 
6. A extremely effective way of blocking spam is by blocking their image URL. 
Blocking "geocites.com" and "terra.es" have been xtremely effective. The URL 
"serverimages.com" supplies porn images for hundreds of sites. I've used 
regex to catch spam with changing URLs by catching the path to the images. 
 
\/biz\/gb[a-z]*\.gif 
com\/z[0-9]+\.gif 
 
The above rules have caught all of the spam from a porn site with seemingly 
endless domains and IP address. This type of filtering is extremely effective 
on those "image only" spams.  
 
\.com\/c2\.gif 
banners\/run[0-9]+.gif 
 
The above rules have caught the viagra image only spam. These type of rules 
need to be created from the spam you are receiving. Even though spam is 
coming from hundreds of IP, addresses and domains, there is usually something 
common and unique to them. 
 
7. These next examples target spammers with multiple URLs or address that 
have common characteristics. The below rule catches spam from a porn site 
that loves to have "climax" in it's domain names. 
 
climax[a-z]*\.com 
 
I used to see all types of URLs like: climaxerections.com. 
climaxhardcore.com, etc. from the same spammer. 
 
This rules catching spam from numerated addresses or URLs: 
 
redv[0-9]*\.com 
 
8. The next couple of conceptions are really just guidelines. Scanmail's stat 
file is invaluable in fighting spam. I found it extremely effective to watch 
the stats file by using "tail -f" to catch spam. Here's an example stat log 
entry: 
 
 Rejected because of spam  (\&\#[0-9]+\;
\&\#[0-9]+\;)~<samrb0bvcjo2vl4rjiq@yahoo.com>~<agape@afo.net>~01/07/04 
16:57~2505~12-222-136-53.client.insightBB.com 
[12.222.136.53]~12.156.195.246~Agape, --> 80%!....gunfight8351ok 
 
Even though it was caught by my regex rule, you tell it is spam by the bogus 
address. Nobody can remember a email like that. More telling is the domain. 
See that it is from "yahoo.com" but the mail server resolves to 
"insightBB.com". Also notice the gibberish in the subject. Then notice that 
is has been relayed, even though I've found this not to be a definitive sign. 
By having the ability to see this info, I then go to that mailbox and 
investigate further, looking for URLs and other blockable signs. 
 
Once you identified and block spam, investigate further. Blocking spam with 
address-list.conf & ip-list.conf is more efficient than the spam-list.conf. 
We have a spam-bot that I call "sendmails.com" because that it the URL in its 
spam. The messages get rejected because of the URL but I noticed that used 
certain email address like send-mails.com, myrainmailserver.com, etc. Adding 
these to address-list.conf has blocked it when it has changed its URL. 
 
Well that is about it in a nutshell. I send you more as we grow.