.

Nov. 21, 2005. The cleaner is down for work. Check back later or contact us via the contacts page.

Using the SPAM Cleaner

The SPAM Cleaner is located here and can be used to scrub SPAM or other text files of identifying information. Customization can be done through this screen and will remain persistent via cookies. Below is kind of a mini-FAQ.

It's not cleaning my spam!

The default code does a great job cleaning my spam! Which means it will do a crappy job cleaning your spam. You need to customize it to obfuscate specific things for your servers/addresses. You can do this through the customization page. With customization, it will still remove your address from the To: lines and perform some general tasks. However to reap the full benefit, you need to add your server name (or at least domain name) to the custom cleaning list.

Can I tell all my friends and post it on all the anti-spam forums?

No! Please don't!

If I detect abuse of this, the cleaner will disappear.

I can not fend off even a minor denial-of-service attack.

I do not trust my code to keep out black-hats. It should be safe but I do not intend to test it.

I have not written the Cleaner for heavy use. There may be race conditions and sharing conflicts that occur with high-load multiple users.

What the heck is a SPAM Cleaner?

The Cleaner was conceived after the author noticed an annoying tendency of spammers: they like to hide identifying material inside of their spam E-mails. I call this "trace". This can range from obvious things like your E-mail address on the To: or Cc: line to less obvious things like putting your E-mail address inside of URL links in the mail body.

There are other, even less obvious identifying characteristics of the modern spam. These include "coded" characters in subject lines, other header fields and message bodies. Many believe these are simply there to confuse spam filters but the placement and configuration of these "codes" would dictate otherwise.

Simply, SPAM Cleaner is a web-based interface to a simple scrubbing program that goes through the E-mail, character by character, looking for references that you do not want to be in the E-mail when you submit it to abuse desks, SpamCop or other authority. This is often called "munging".

The E-mail is submitted by a web form. There is no connection to your E-mail server or client. There is nothing to download. It doesn't know E-mail from Adam, you can clean any piece of text you like.

If you (a) simply delete spam, (b) submit it to a HIGHLY trusted abuse desk or database, or (c) don't know what spam is, then you probably don't need SPAM Cleaner and would be better off not reading further.

Who cares what's in the spam I submit?

Again, if you don't mind being linked directly to the spam you submit to authorities, stop reading here and use your regular reporting methods. At this point I should point out that most of the on-line spam databases and sites like SpamCop.net already do most of this munging for you. What we are doing here is not new. It is not revolutionary. It is not considered paranoid by the spam-fighting community.

One can only wonder why spammers would want to link the spam they send to the recipient. That question can not and will not be answered here. It has been documented that people who report spam regularly have become the target of "spam bombs", "joe-jobs" and other retaliatory tactics. Obviously (if the abuse desks and postmasters are doing their job) spammers get in trouble when people report their actions. They don't like that. There are less-sinister reasons for the codes and "trace" as well. Often the spammers get paid by the number of unique "hits" their E-mails generate on the spam-vertised web sites. Or simply knowing that their E-mails are getting through to intended recipients on intended servers (which they can confirm as soon as you report them as spammers) is enough to keep them satisfied.

Regardless of motivation, some of us would rather make an effort to protect our privacy. Our privacy was already violated by having our personal E-mail address fall into the hands of losers and criminals, we don't need to give them anything else voluntarily.

What, exactly, does it do?

As of this writing, SPAM Cleaner performs the following on text submitted:

1. Chops off the SpamAssassin header, if it's there. I haven't tried this with all permutations of SA headers but it appears to work for all of them that I use and my hosting companies use. There is a wealth of trace in the SA header and it is not actually part of the spam anyway. This is mostly just a time-saver so you don't have to start your cut-and-paste below the header.

2. Mungs the To: address and all associated fields in the header. This is the obvious munging that anyone should do. This is where your real E-mail address is likely going to be. It also checks other fields such as Rcpt-to: and Delivered-to:. It replaces the victim addresses with the sender (From:) address. Yes, the From address is almost always fake. It looks a bit better than placing an X in there but I could easily do that too.

3. Deletes the Cc: line. There is nothing in there worth keeping, nothing that would identify the spammer to an abuse authority. It is likely just more victims of the same spam.

4. Deletes other recipients that are wrapped below the To: line or Cc: line. Some mail servers reformat headers this way. Other recipients are on a separate line, preceded by a tab or spaces. If you haven't seen yours do this, don't worry about it.

5. Compares all the text with a list of "exclusion words" and, if found, mungs them. This is where the real detective work starts. The Cleaner has a list of words that it will mung if they occur anywhere in the header or body. This can be things like your last name, your mail server's name, your domain name, your IP address, etc. I would put your username in here too even though Step 2. above should get most of those.

6. Optionally mung the X-Mailer field. Lately this seems to be a favorite header field for spammers to mess with. Not sure if they are putting trace codes in here or (more likely) just disguising the fact that they are using rat-ware to do their mailing. This option is activated by a check-box on the main Cleaner screen.

7. Optionally mung "junk text" that is often placed at the end of subject lines and the very end of the body. This is getting harder and harder to parse through automation. This option will look for several things: (a) text in the subject separated by two or more spaces. (b) text in the subject line that occurs after specific punctuation-space combinations. (c) text that occurs after the </html> tag (if it exists) in the body. If it finds (a), (b) or (c), it will delete trace text that follows those markers. This option is activated by a check-box on the main Cleaner screen. You are usually safe to keep this on.

SPAM Cleaner will print out notices of everything its doing related to items 1, 3, 4, 5 above. It will then print out the "cleaned" spam to allow you to copy-paste this into a LART message or some site like SpamCop.net.

How do I customize it for me?

Simple. Make sure you allow cookies and go to the customization page which will allow you to add your specific mung items to the list.

Can I get the source code?

Nope. Not because I'm a BOFH, I just don't want to expose myself to security issues. I will admit that my programming abilities are far from exemplary and without careful review by a top-notch programmer, I don't want to expose my code to the obvious security issues that releasing source opens up. Call it security through obscurity, call it anti-open-source, call it what you will. That being said, I would be happy to work with a more knowledgeable programmer/spam-fighter if I can be convinced of bona-fides.

I didn't write Cleaner for open-source fame. I wrote it for my use. If others can make use of it, great. Honestly, its pretty simple, anyone with any knowledge of CGI programming could make a duplicate in a matter of hours.

Are you spying on me?

Uhhh, no. And no, I don't keep the spam you submit. I have plenty of my own.

People tend to get wigged out by cookies. I might point out, cookies were invented by Netscape, not Microsoft. Anyway, there are not as many security issues with cookies as the rumors would have you believe. Most of the stories about people having telephone numbers and credit card numbers stolen via cookies are urban legends. I have restricted my cookies to the pettingers.net domain so unless your web browser is handing out cookies illegally, nobody should have access to the information you put in there. And you aren't allowed to use the @ symbol there anyway so you shouldn't be storing E-mail addresses in those cookies.

Any tips for cleaning?

Yep.

1. If you use the "Attempt to clean up trash text" checkbox, it will do its best, but you can help it. If you see a subject line like

     Buy Now!!!afg6532

you may want to change it to something like:

     Buy Now!!!     afg6532

This can be done with a couple hits on the space bar after you paste the message and before you hit the "clean" button. If there aren't at least two spaces in there (or one after a period, exclamation, question), it won't find the trash text.

2. Look carefully at the Received: headers in your E-mail. Some servers actually put the server name in there and some will put the IP address in dotted notation (either forward or in-arpa reverse). You may want to add these IP addresses to your mung list. If you don't, you've confirmed for the spammer that there is a mail server at that address and that it is open to receiving spam from their source.

3. Read the restrictions on the customization page carefully. If you put odd characters in there (like @ or |) it will happily save these in your cookie but when the cookie is returned to the main cleaner page they will be stripped out and odd things might happen. Spaces are o.k., they will just be removed. You should be able to sufficiently mung with just the allowed items: Letters, numbers, underscore, and period. Use a comma to separate the strings.

.