The following was submitted by Sid Mclaurin of American Families Online. They
are an ISP in Florida and a very successful user of Scanmail. On average
they block over 85% of incoming mail a day. You reach Sid at smclaurin@afo.net
----------------------------------------------------------------------------
As promised, here some the regex rules I've created along with some pratcial
tips on using Scanmail.
1. The most successfully way to block spam is by the URL, what I call the
"money maker". Spammers make a living by getting fractions of a penny from
having unexceptioning mail reciepents clicking on their links. Most of the so
called "Unscribe" links go to the same URL as the "Scribe" link. The most
effective rule that I've implemented towards spam is by blocking ".biz" in
the spam-list.conf and address-list.conf. Occassionally email from yahoo or
hotmail has a ".biz" spam footer that gets rejected but it has been totally
worth it. ".biz" now account for one quarter of the "Rejected" mail.
2. The next most common "spam sign" is obfuscated words. Normal mail never
has such blanant misspellings and regex does a great job of catching this
type of spam. The most common spam word for us is "viagra" or some deviation.
Here are the regex examples I've implemented to catch this type of spam:
v[\,\"\'\:\;\+\=\?]*[i1\|íl][\,\"\'\:\;\+\=\?]*[A\@][\,\"\'\:\;\+\=\?]*[gq]
[\,\"\'\:;\+\=\?]*r[\,\"\'\:;\+\=\?]*[A\@]
v[\-\(\)\#\`\~\%\$]*[i1\|íl][\-\(\)\#\`\~\%\$]*[A\@][\-\(\)\#\`\~\%\$]*[gq]
[\-\(\)\#\`\~\%\$]*r[\-\(\)\#\`\~\%\$]*[A\@]
v[\s\.\|\/\^\!\*\_]*[i1\|íl][\s\.\|\/\^\!\*\_]*[A\@][\s\.\|\/\^\!\*\_]*[gq]
[\s\.\|\//\^\!\*\_]*r[\s\.\|\//\^\!\*\_]*[A\@]
These rules catch the word "viagra" along with common deviation like:
v-i@)gr@
v.iagr.@
By have the "*" allows it to catch the nonense sperating charcter if it is or
isn't presence. Rarely do I see a uniformed obfuscated word like:
v-i-a-g-r-a.
I had to break this rule into three seperate rules because I encounter a
character limitation.
Recently, we've encounter a deviation in the viagra spelling by adding a
character, usually between the "g" and "r" like "viagGra". I've use the
following regex to counter:
vi[i|]*ag[a-z\|]ra
3. Another "spamsign" and DoS attack is to have a large number of constants
in the to & from address. The following rules catching this type of spam:
[bcghjklmnpqrstvwxz]{8,20}\@
This rule should be tweak per ISP. We have a couple of users that have 7
consective constants so I had to set the rule to "8". The "20" upper boundary
is there for legitimate mailing list servers. Notice that there are several
constants left out like "d" & "f" because they are common initials. Most spam
random generated address have these particular constants. I'm trying to
develop a rule that incorportates "y"'s because they are very common.
4. Other obfuscated words are a guarantee "spam sign". I found looking for
obfuscated word "girl" (ex. g|rl or g1rl) isn't practical because real users
tend get "cute" with their emails. The following words have a 100% spam
indication:
\bFr0m
\bG\@ng
\bH\@rd
\bP0rn
\bPorn0
\bpuss[1\|]es
\bdiscreet[1\|]y
\bhardc0re
\br3move
\bremov3
\brem0ve
\br3mov3
\br3m0v3
\bappr0ved
\bd0se
\bt\@ke
\bd[1l\|]scount
\bb00st
\bbo0st
\bb0ost
\ben[1\|]ar[gq]ement
\b0pp0rtunity
\bdisc0unt
\bvirgiens
\bj[1l\|]zz
\bd0ng
\bh0le
\bpuss[|1][e3]s
\bvaigra
\bmicrrowave
\bgeeneric
\bgenneric
The "\b" keeps it from being identified when the scanning attachments option
is used. This list should vary from ISP but these are the most common. I
recommend that each ISP study their spam for common obfuscated words. I've
notcied that "our spam" has certain characteristics. On a similar note, I've
noticed a difference in spam when it is forwarded from another mail server
(ex. when a customer switches ISPs and has their mail from their old address
forwarded). First because the blacklisted IP are stripped from the header so
only thing stopping it is your scanmail config files. After that, you will
see different URLs and different types of obfuscation. I highly recommend not
allowing this type of forwarding to continue for long because of being what I
call "bot-it".
4. "Bot-it" is when a spam robot spams you going down a list of address,
generally alphabetical. This extreme prevelant with three or less character
emails. As general rule for an ISP, make your customers have atleast six
character or numbers in the email address, preferably both. I found it to be
effective to block certain email address of your own domain for spam and
viruses. james@afo.net & admin@afo.net are blocked because mass mailing
viruses. Other old address are blocked because they are "spam catchers". Look
for address that start with "a" or "b".
5. These next several rules target the "deceptive" characterisitics of html
spam. These rules need to watched and tweaked to keep from catching "ham".
Common spam sign is to include a bunch of constant in between html brackets,
"< >". I'm not quite sure why, maybe is to confuse SpamAssassin but Scanmail
can see it. I catch it with:
<[bcghjklmnpqrstvwxz]{5}>
If you lower it to "4" you catch the "" tag. Regular mail never has
this nonsense. A similar rule is this:
<\!\w{10,60}>
that catches extremely long commented gibberish.
By blocking the "money maker", spammer must resort to other ways of getting
there links through. Below are to catch encoded URL's in the html mail:
\&\#[0-9]+\;\&\#[0-9]+\;
\%[0-9]+\%[0-9]+\%
I've had to put a couple of mailing list servers on the exceptions-list.conf
to keep these rules in place but they have caught a tremedous amount of spam.
I've tried to incorparte the other type of html obfuscation with this rule:
\%[0-9]+\%[0-9]+\%
but apparently earthlink's webmail interface attaches a footer using this
encoding so it should be avoided.
Another common spam sign is the use of bogus frame tags. This rule catches
them:
\w*<\/noframes><\/frame>
Even though it is not very common for us, it is always spam.
A very common spam sign is to have "white" or "invisible" text in html
emails. I'm slowly developing rules to catch this type of spam with catching
ham. This rule is great on catching those "near" invisble text by looking for
off white variations:
color=[\"\']*\#FFFFF[0-9A-E]
This rule has only caught spam.
6. A extremely effective way of blocking spam is by blocking their image URL.
Blocking "geocites.com" and "terra.es" have been xtremely effective. The URL
"serverimages.com" supplies porn images for hundreds of sites. I've used
regex to catch spam with changing URLs by catching the path to the images.
\/biz\/gb[a-z]*\.gif
com\/z[0-9]+\.gif
The above rules have caught all of the spam from a porn site with seemingly
endless domains and IP address. This type of filtering is extremely effective
on those "image only" spams.
\.com\/c2\.gif
banners\/run[0-9]+.gif
The above rules have caught the viagra image only spam. These type of rules
need to be created from the spam you are receiving. Even though spam is
coming from hundreds of IP, addresses and domains, there is usually something
common and unique to them.
7. These next examples target spammers with multiple URLs or address that
have common characteristics. The below rule catches spam from a porn site
that loves to have "climax" in it's domain names.
climax[a-z]*\.com
I used to see all types of URLs like: climaxerections.com.
climaxhardcore.com, etc. from the same spammer.
This rules catching spam from numerated addresses or URLs:
redv[0-9]*\.com
8. The next couple of conceptions are really just guidelines. Scanmail's stat
file is invaluable in fighting spam. I found it extremely effective to watch
the stats file by using "tail -f" to catch spam. Here's an example stat log
entry:
Rejected because of spam (\&\#[0-9]+\;
\&\#[0-9]+\;)~~~01/07/04
16:57~2505~12-222-136-53.client.insightBB.com
[12.222.136.53]~12.156.195.246~Agape, --> 80%!....gunfight8351ok
Even though it was caught by my regex rule, you tell it is spam by the bogus
address. Nobody can remember a email like that. More telling is the domain.
See that it is from "yahoo.com" but the mail server resolves to
"insightBB.com". Also notice the gibberish in the subject. Then notice that
is has been relayed, even though I've found this not to be a definitive sign.
By having the ability to see this info, I then go to that mailbox and
investigate further, looking for URLs and other blockable signs.
Once you identified and block spam, investigate further. Blocking spam with
address-list.conf & ip-list.conf is more efficient than the spam-list.conf.
We have a spam-bot that I call "sendmails.com" because that it the URL in its
spam. The messages get rejected because of the URL but I noticed that used
certain email address like send-mails.com, myrainmailserver.com, etc. Adding
these to address-list.conf has blocked it when it has changed its URL.
Well that is about it in a nutshell. I send you more as we grow.