SpamAssassin Rules -- What's working now

What's Working Now in Spam Assassin -- Some Simple SA Rules and Plug-ins

Ever do a Google search for Spam Assassin rules that detect attachments? I did, and didn't have a lot of luck. I believe the reason is that, although multi-part attachment designations appear in the body of the E-mail, Spam Assassin will not allow you to look at those designations using a simple rule. Even a rawbody rule will not allow you to examine content-type designations. So how do we look at attachments?

I found one way to detect attachments. Is it the right way? Probably not. Does it work? It appears to. I offer this up as alpha code for you to play with. PLEASE do a spamassassin --lint command to check things if you use this.

Obviously you need to use the later versions of SA (3-series) as the older versions did not support Plug-ins like this.

Some significant caveats for this:

1. As it stands, this only detects zip and pdf attachments. You can add others very easily by modifying the REGEX in the Plug-in.

2. Obviously there is TONS of ham mail that has pdf and zip attachments. That is why YOU NEED TO BE CAREFUL with the scores on this. I only activate it for my users who very seldom get these type of attachments. Even then I only use it as a rule to bump the total over the threshold when used in combination with other rules.

3. There may be (very easy) ways for spammers to hide attachments from this Plug-in. It works for the majority of them though. I can add some more hooks if necessary.

4. I would recommend you NOT add image type attachments to the list of file types. There are plenty of default tests for images, just adjust the scores accordingly.

There are two pieces to this. The Plug-in itself which is just a Perl module and the ruleset that activates it. I placed the Plug-in in the following sub-directory: .../Mail/SpamAssassin/Plugin/ This is the standard library directory. This is usually under one of your /usr/lib subdirectories. Doing locate Plugin will usually give you some clues where they are hiding. You can place the Plug-in in the same location as your .cf file (See next paragraph) but you must explicitly point to the Plug-in with something like loadplugin Mail::Attachments Attachments.pm

The second piece is the activation rule. I place this in a standard .cf file under /etc/mail/spamassassin/ directory. That's the same place I put other rules below.

loadplugin      Mail::SpamAssassin::Plugin::Attachments

body    ATTACHMENT_PRESENT      eval:check_attach()
score   ATTACHMENT_PRESENT      0.1
describe        ATTACHMENT_PRESENT      Contains a pdf or zip attachment

You can download the Plug-in here: attachment-plugin.tar

Below are some rules which I use with Spam Assassin. I change these about once a week depending on what the spammers do and this page changes somewhat less frequently than that. Use these at your own risk. You need to assess the effects of these on your E-mail. Perhaps set the Score for a rule to 0.01 until you know how it works on your server.

You will notice rather high scores on many of these rules because I tend to only write rules that target spam slipping through the default rules in SA. That is, if SA was already tagging the "leaker" spam, I wouldn't need the new rule! So there is no point setting the scores on these to a low number (except when I test them initially for false positives). In for a penny, in for a pound. Go big or stay at home.

There is a strong temporal element to rule writing. When I first wrote my DICT_DUMP rule, dictionary dumping and Bayes poisoning was popular and I would get hundreds of hits a day on that. Now I hardly get one hit a day. [Note that the latest version of Spam Assassin now has a dictionary dump rule as part of it's default rule set -- I still use mine in conjunction with that.] Even the old-school tactic of subject obfuscation has fallen off significantly lately. Some of the rules only apply to one spammer and when they go away or change tactics, the rule lays idle. Nobody said this was a low maintenance game!

I DID NOT AUTHOR ALL OF THESE RULES! I am not attempting to take credit for someone else's work. You can find many of these rules and some other excellent rules at the Apache Custom Rulesets page. I've hacked some of these for my own uses as shown below.

Am I worried that the spammers will gain some intel from this? Not really. You think they actually spend time looking for this stuff? Even if they did, I'll just write more rules. It becomes a chicken-or-egg situation and always will be. I'm so far off the radar screen I have much bigger issues to worry about.

If you have any doubt about using blacklists with Spam Assassin, be sure to have a look at our page on the effectiveness of Spam Assassin and Blacklists. Ideally you would run the DNS blacklists on the SMTP server, not in Spam Assassin, but sometimes you get mail via forwarding or fetchmail and still need the blacklist portion of Spam Assassin.

After looking at some of this, I hope people will realize how easy it is to make custom rules. You can, at least temporarily, significantly increase the effectiveness of SA by spending 30 seconds authoring a custom rule. Nothing below is rocket science. Also look carefully at the hits the leaker spam is getting on the existing SA rules. Some times you can simply bump up a score value for an existing rule and push the leaker spam over the threshold. When the spam tactic de jour changes, you can remove the score bump.

Some general pointers for choosing a custom rule to catch the leakers. (again, not rocket science. And I do know about rocket science.)

Look for something distinctive that you wouldn't expect to see in a normal message body or header. This sounds like common sense, and it is, but this is the first step in the thought process. Spammers insist on crafting their messages with peculiar syntax. Some times this is done with the intention of dodging spam filters. What this does, in reality, is make them easier to spot!
Think about that distinctive marking in terms of legitimate "ham" mail. This is the contrapositive of the item above. Would you ever see a string of 25 consecutive, non-space characters in a normal message? Well, you might in the "raw" message body from things like multi-part boundaries, but probably not in the actual message text. Would you expect to see poor english sentence structure from your english speaking clients and friends? Probably not. Remember, your goal is to create a rule that will never fire on legitimate E-mail. If it does trip on ham, you have to play games with score weighting. Some times that is unavoidable.
The "rawbody" descriptor is very powerful. You can use this to dig into distinctive text that the spammer's rat-ware uses to construct the message, usually an HTML message. Look for HTML tags or sequences of tags that uniquely identify the leaker. Some times even sequences of spaces are distinctive enough. Use RAWBODY only when necessary as it can be expensive in terms of processing time and memory.
Take a look at the header for distinctive rat-ware traces. This has become less and less effective over the past few months, especially in subject lines. Some times if a particular spammer is placing bogus virus scans, joe-jobbed receive lines or X-Comment fields in the header, these can be used in rules.
Set the score low and watch it. This is the hard work of the process. You need to look at both ham and spam tagging to see how your rule is doing. Depending on your traffic flow, a few hits, false positives, or false negatives will tell you how the rule is doing. Some times you need to watch things for many days. If things look good, push up the score. Remember, you wrote the rule because the spam was leaking through the existing rules. Go big or stay at home.

I put these rules in a custom .cf file in the /etc/mail/spamassassin/ directory and restart the spamd daemon each time I change the file.

So all you need to do is cut and paste these into a file, say custom_SA-rules.cf and save that file in /etc/mail/spamassassin/

Then, just restart the Spam Assassin daemon, spamd. (You are running spamd aren't you? If not, why not!?) On many versions of Linux you can do this with service spamassassin restart although you'll want to check your documentation to see how to restart spamd.

Rules....(and a few score bumps)


# Short-Circuit if found in local blacklist or whitelist

meta          SC_HAM (USER_IN_WHITELIST||USER_IN_DEF_WHITELIST||USER_IN_ALL_SPAM_TO||NO_RELAYS||
ALL_TRUSTED||USER_IN_BLACKLIST_TO||USER_IN_BLACKLIST)
priority      SC_HAM -1000
shortcircuit  SC_HAM ham
score         SC_HAM -20


rawbody     NO_HTTP   /and paste in your browser/i
score       NO_HTTP   4.5
describe    NO_HTTP   No HTTP on link

body        STOCKDUMP2   /Investor Alert/i
score       STOCKDUMP2   7.0
describe    STOCKDUMP2   Pump and Dump Investor Alert

rawbody     GEOCITIES1   /\.geocities\.com\//i
score       GEOCITIES1   5.0
describe    GEOCITIES1   Geocities Link

rawbody     GEOCITIES2   /\.geocities\.yahoo\//i
score       GEOCITIES2   5.0
describe    GEOCITIES2   Geocities Link 2

body        SOFTWARESPAM   /attachment message\.html/
score       SOFTWARESPAM   5.0
describe    SOFTWARESPAM   leaker software scam

rawbody     TRIPOD1   /\.tripod\.com/
score       TRIPOD1   5.0
describe    TRIPOD1   Tripod Link

body        STOCKDUMP5   /investment advice/
score       STOCKDUMP5   4.9
describe    STOCKDUMP5   Pump and Dump Five

header      VIRUS_SPAM   Subject =~ /Hidden message/
score       VIRUS_SPAM   99.0
describe    VIRUS_SPAM   Potential virus in attachment

header      VIRUS_SPAM2   Subject =~ /Protected message/
score       VIRUS_SPAM2   99.0
describe    VIRUS_SPAM2   Potential virus in attachment 2

body        STOCKDUMP8   /\W[A-Z]{4}\s*\.\s*PK\s/i
score       STOCKDUMP8   4.5
describe    STOCKDUMP8   Pump and Dump Microcap One

body        STOCKDUMP9   /\W[A-Z]{4}\s*\.\s*OB\s/i
score       STOCKDUMP9   4.5
describe    STOCKDUMP9   Pump and Dump Microcap Two

header      BOGUS_THREAD   ALL =~ /Thread-Index/i
score       BOGUS_THREAD   0.5
describe    BOGUS_THREAD   Contains Thread-Index in header

body        STOCKDUMP13   /Target price/i
score       STOCKDUMP13   10.0
describe    STOCKDUMP13   Pump and Dump target price

rawbody     MALWARE01   /ecard number/i
score       MALWARE01   10.0
describe    MALWARE01   E-Card Malware Attempt

rawbody     NICEG   /I am nice girl/i
score       NICEG   6.5
describe    NICEG   Nice Girl mail order bride

body        DICT_DUMP_CUSTOM01   /(((\b|\s)[a-z]{4,}\b){7,})/
describe    DICT_DUMP_CUSTOM01   Text in non-English syntax-4X7
score       DICT_DUMP_CUSTOM01   0.5

body        DICT_DUMP_CUSTOM02   /(((\b|\s)[a-z]{5,}\b){7,})/
describe    DICT_DUMP_CUSTOM02   Text in non-English syntax-5X7
score       DICT_DUMP_CUSTOM02   0.8

body        DICT_DUMP_CUSTOM03   /(((\b|\s)[a-z]{5,}\b){8,})/
describe    DICT_DUMP_CUSTOM03   Text in non-English syntax-5X8
score       DICT_DUMP_CUSTOM03   1.2

header      RODENTDROPPINGS1   ALL =~ /SquirrelMail authenticated user/i
score       RODENTDROPPINGS1   0.1
describe    RODENTDROPPINGS1   Mail from a SquirrelMail account

body        SHYSTER_ONE   /barrister/i
score       SHYSTER_ONE   2.0
describe    SHYSTER_ONE   Body makes reference to barrister

uri         PAGE_AD   /pagead\/iclk/i
score       PAGE_AD   4.2
describe    PAGE_AD   Google relay to spamvertized site

uri         EXE_FILE   /\w\.exe/i
score       EXE_FILE   10.0
describe    EXE_FILE   Potential link to executable

uri         BLOGSPLAT   /\w\.blogspot\.com/i
score       BLOGSPLAT   2.5
describe    BLOGSPLAT   Contains link to blogspot.com

header      RODENTDROPPINGS2   ALL =~ /Internet Messaging Program \(IMP\)/
score       RODENTDROPPINGS2  0.1
describe    RODENTDROPPINGS2  Mail from an IMP agent

####################

# Bump up some scores that should have low likelyhood of FP

score   RCVD_IN_BL_SPAMCOP_NET  5.5
score   RCVD_IN_SBL             5.5
score   RCVD_IN_XBL             5.5
score   RCVD_IN_PBL             5.5
score   RCVD_IN_DSBL            5.0
score   RCVD_IN_SORBS_HTTP      3.5
score   RCVD_IN_SORBS_MISC      3.5
score   RCVD_IN_SORBS_SMTP      4.5
score   RCVD_IN_SORBS_SOCKS     3.5
score   RCVD_IN_SORBS_WEB       3.5
score   RCVD_IN_SORBS_BLOCK     4.5
score   RCVD_IN_SORBS_ZOMBIE    3.5
score   RCVD_IN_SORBS_DUL       4.5
score   HTML_TAG_BALANCE_BODY   2.0
score   HTML_TAG_BALANCE_HEAD   3.0
score   HTML_IMAGE_ONLY_04      4.0
score   HTML_MESSAGE            0.3
score   INVALID_DATE            3.2
score   RCVD_IN_NJABL_SPAM      3.5
score   RCVD_IN_NJABL_PROXY     5.5
score   RCVD_IN_NJABL_RELAY     4.5
score   RCVD_IN_NJABL_MULTI     2.5
score   RCVD_IN_NJABL_CGI       2.5
score   ONLINE_PHARMACY         4.0
score   URIBL_SBL               5.5
score   URIBL_SC_SURBL          5.5
score   URIBL_WS_SURBL          4.9
score   URIBL_PH_SURBL          4.9
score   URIBL_OB_SURBL          4.9
score   URIBL_AB_SURBL          4.9
score   URIBL_JP_SURBL          4.9
score   URIBL_BLACK             5.0
score   SPF_HELO_PASS           -1.0
score   SPF_PASS                -1.0
score   RCVD_ILLEGAL_IP         5.0
score   RATWARE_RCVD_PF         4.8
score   BAYES_99                4.8
score   MICROSOFT_EXECUTABLE    20.0
score   RDNS_NONE               3.8
score   URIBL_RHS_DOB           3.8


# Do a summary to give more weight to blacklists

meta       CUSTOM_RCVD_IN_MANY ( RCVD_IN_BL_SPAMCOP_NET + RCVD_IN_SBL + RCVD_IN_XBL + RCVD_IN_SORBS_DUL 
+ RCVD_IN_SORBS_SMTP + RCVD_IN_NJABL_RELAY + RCVD_IN_DSBL + RCVD_IN_NJABL_SPAM + RCVD_IN_NJABL_PROXY 
+ RCVD_IN_SORBS_HTTP + RCVD_IN_SORBS_BLOCK) > 2
describe   CUSTOM_RCVD_IN_MANY   Message received in more than 2 RBLs
score      CUSTOM_RCVD_IN_MANY 3.0

#
# Do a check for odd letter combinations
#
# The following rules were borrowed from an older version of SA.
rawbody  __PGP_BEGIN            /^-----BEGIN PGP SIGNATURE-----$/
rawbody  __PGP_MIDDLE           /^[0-9A-Za-z+\/]{64}$/
rawbody  __PGP_END              /^-----END PGP SIGNATURE-----$/
meta     __PGP_SIGNATURE        (__PGP_BEGIN && __PGP_MIDDLE && __PGP_END)

# Prevent hits with Double forwards, or messages with attachments not parsed out.
rawbody  __FVGT_rb_ATTACHMENT   /Content-Disposition: attachment/i

# Core obfu rules, these are generated from multiple US dictionary files.
body  __FVGT_b_OBFU_J           /j[bcfgw]/i
body  __FVGT_b_OBFU_OTHER       /(vj|vk|xj|xk|yy|zf|zj)/i
body  __FVGT_b_OBFU_Q0          /[jkpqtvwz]q/i
body  __FVGT_b_OBFU_Q1          /q[afhjkmnsy]/i
body  __FVGT_b_OBFU_V           /[fgqw]v/i
body  __FVGT_b_OBFU_X           /[cgjkqsvz]x/i
body  __FVGT_b_OBFU_Z           /[fjkpqx]z/i

meta  __FVGT_m_MULTI_ODD2 ((__FVGT_b_OBFU_J + __FVGT_b_OBFU_OTHER + __FVGT_b_OBFU_Q0 
+ __FVGT_b_OBFU_Q1 + __FVGT_b_OBFU_V + __FVGT_b_OBFU_X + __FVGT_b_OBFU_Z) > 1)
meta  __FVGT_m_MULTI_ODD3 ((__FVGT_b_OBFU_J + __FVGT_b_OBFU_OTHER + __FVGT_b_OBFU_Q0 
+ __FVGT_b_OBFU_Q1 + __FVGT_b_OBFU_V + __FVGT_b_OBFU_X + __FVGT_b_OBFU_Z) > 2)
meta  __FVGT_m_MULTI_ODD4 ((__FVGT_b_OBFU_J + __FVGT_b_OBFU_OTHER + __FVGT_b_OBFU_Q0 
+ __FVGT_b_OBFU_Q1 + __FVGT_b_OBFU_V + __FVGT_b_OBFU_X + __FVGT_b_OBFU_Z) > 3)
meta  __FVGT_m_MULTI_ODD5 ((__FVGT_b_OBFU_J + __FVGT_b_OBFU_OTHER + __FVGT_b_OBFU_Q0 
+ __FVGT_b_OBFU_Q1 + __FVGT_b_OBFU_V + __FVGT_b_OBFU_X + __FVGT_b_OBFU_Z) > 4)

# Core meta rules, these combine multiple variations of above rules.
meta       FVGT_m_MULTI_ODD2   (__FVGT_m_MULTI_ODD2 && !__FVGT_rb_ATTACHMENT && !__PGP_SIGNATURE)
describe   FVGT_m_MULTI_ODD2   Contains multiple odd letter combinations
meta       FVGT_m_MULTI_ODD3   (__FVGT_m_MULTI_ODD3 && !__FVGT_rb_ATTACHMENT && !__PGP_SIGNATURE)
describe   FVGT_m_MULTI_ODD3   Contains multiple odd letter combinations
meta       FVGT_m_MULTI_ODD4   (__FVGT_m_MULTI_ODD4 && !__FVGT_rb_ATTACHMENT && !__PGP_SIGNATURE)
describe   FVGT_m_MULTI_ODD4   Contains multiple odd letter combinations
meta       FVGT_m_MULTI_ODD5   (__FVGT_m_MULTI_ODD5 && !__FVGT_rb_ATTACHMENT && !__PGP_SIGNATURE)
describe   FVGT_m_MULTI_ODD5   Contains multiple odd letter combinations

score  FVGT_m_MULTI_ODD2 1.1
score  FVGT_m_MULTI_ODD3 1.3
score  FVGT_m_MULTI_ODD4 1.3
score  FVGT_m_MULTI_ODD5 1.4



uri    FVGT_u_HAS_2LETTERFLDR    /\/[a-zA-Z]{2}\//
describe    FVGT_u_HAS_2LETTERFLDR    FVGT - URL has a 2 letter folder like /ab/
score    FVGT_u_HAS_2LETTERFLDR    0.5

header  FVGT_s_SINGLE_LETTER Subject =~ /\s[dfghjlmnpqstvwzDFGHJLMNPQSTVWZ]{1}\s/
describe FVGT_s_SINGLE_LETTER FVGT - Single non-vowel seperated by spaces
score  FVGT_s_SINGLE_LETTER 0.3

What's Working Now in Spam Assassin -- Some Simple SA Rules and Plug-ins

Rules....(and a few score bumps)

Vectors at

Acting on Annoyance