Data mining in reverse (or who’s spamming you after all)

Everybody hates spam, the scourge of the Internet. One of the common ways to reduce the amount of spam one gets to use throw-away email accounts or aliases when signing up for compulsory registration services (another scourge of the Internet in my mind, but that’s just my opinion).

Google even makes this easy with Gmail, as you can append a + sign followed by text to create a throw-away alias for your Gmail account (with the disadvantage that spammers could trivially strip + signs and determine your real email address if they were so inclined).

One way to take this a bit further and figure out just who is spamming you, however, is to use unique aliases for every compulsory registration service you sign up for, and then take note of which ones actually start getting spammed. Provided you really don’t reuse those aliases, if one starts getting spammed, it’s a pretty good indicator that the person you gave it to is either compromised or selling your email address out.

For example, I used a unique alias to sign up for some VMware Server serial numbers some time ago (they’re freely available), and recently (to my surprise) it started getting spams (of the type that I would imagine would be emitted by spam bots on compromised home systems). Now, either VMware is selling out my email to spammers of the more shady sort (which I would consider unlikely for a reputable company), or someone with access to VMware’s marketing mailing lists got compromised with some sort of malware at some point and the contents of the mailing list got inadvertently leaked out (oops!). I’d consider the latter more likely in this case, at least unless VMware has some sort of underhanded interest in herbal medication and the like that they’ve been keeping under the table…

I think this is a good time to point out that even reputable companies make mistakes, and it only takes one person’s compromised Outlook to spill the goods on a mailing list. As a result, even with more trusted companies, I am more inclined to use throw-away aliases rather than my main alias, so that I can cut off the throw-away aliases when they start getting spam some time down the road.

7 Responses to “Data mining in reverse (or who’s spamming you after all)”

  1. just a guest says:

    I warmly advise you to try http://www.mailinator.com you can give anything@mailinator.com and be assured that this mailbox exists. later you can log on to this mail box from its site, and the mail is saved there for 24 hours. The only disadvantage is that there is no privacy, so if you pick a common name like johndoe@mailinator.com it is likely you will encounter some other people checking out your mail.

  2. Kevin says:

    I’ve been doing this for a few years. I found Yahoo actually has a great feature, Addressguard, where in a few clicks – you can have a new email. You tell it some ‘base’ for the email, and then you add a -. So the base is like ‘cowboy’ and the end is ‘cowboy-blogstuff@yahoo.com’. I have about 250 total currently. I even have my wife setting up new ones when she goes to a any new website. It helps that we also use KeePass to keep track of the passwords and logins to all of those sites – since it would be VERY confusing otherwise! But I couldn’t be happier. It was a bit weird at first, but you slowly get trained and now it feels wrong for me to use our primary email. I’ve had my primary yahoo account for 2 years now and I get 0 spam in that box. I’ve gotten spam from Xanboo and a few other small places, but nothing major from all of the 250 emails – surprisingly. I’ve dumped my simple email for one of those funky ones a while back and now receive 0 spam as well. My wife is still attached to her ‘short’ email, which is clogged with spam – but it sure is nice being able to make accurate filters and keep my inbox tidy without having to worry about spam filters.

  3. SpamGourmet is another nice one. I can hand out addresses like nynaeve.2.moyix@spamgourmet.com, and SpamGourmet will forward up to two messages to that address on to me. I can also go and get stats on what addresses have had e-mail discarded based on the rules I set up (I don’t remember to whom I gave out “no.1.moyix@spamgourmet.com”, but clearly I made the right choice, as it’s now blocked over 700 messages to that address).

    Cheers, and thanks for all the awesome RE and debugging posts!

  4. Good Point says:

    I think there is a third option you are not thinking of. Spammers just guess at email addresses. For example, when gmail first came out I signed up and got an address. I didn’t use the account for months, just logging in once in a while, but eventually I started getting spam.

    The user name I picked was the same as the one I have for a yahoo account. I assume some spammer(s) just said hey, there’s this new gmail.com web-mail, let’s spam them also using common user names we already know about.

  5. Skywing says:

    Possible, but the alias I used to sign up for VMware was not exactly short nor only a few letters off from any other aliases I’ve used. I don’t think that it was guessed at random.

  6. Sean says:

    A few points:

    * the gmail plus is actually a longstanding feature of sendmail. It’s referred to as a plussed address (do some websearches, and you’ll find reference to it). I’ve used it for a long time as a simple way to sort mail, even though I manage several servers for my own domains (and thus can – and often do – create new aliases on a whim). Some years ago, when some certain email-aware malware started gaining ground, I noted a trend in the maillogs: messages being addressed to the plussed portion of the address: joeuser+function@domain.tld would have been harvested as “function@domain.tld”, and so long as the plussed portion itself didn’t resolve to a legitimate account, it’d be rejected.

    In sendmail, you can configure specific plussed addresses to be rejected, so it is possible to “retire” a plussed address – if you manage your own server. Spammers really aren’t taking the time to examine individual addresses – it’s all automated harvesting.

    There’s a caveat to plussing though: there are some websites which are clueless as to what constitutes a legitimate address, and in their zeal to ensure you’re not entering bogus info, they barf on the plus. Nevermind that it’s legal and has been part of sendmail since before some of the web developers were even BORN.

    * limiting where and how you publish your address on the web will have a significant impact on how much crap you receive. I use a technique I call “mailfuscation” – whereby the mailto: link (or really, any appearance of an email address) on a webpage is embedded in the HTML as ordinals and optionally HTML escapes, etc. Legitimate browsers have no problem parsing this, but malware and spamware generally takes the cheap route and performs simple string searches, and as a result completely misses the email addresses. handcoding mailfuscated addresses is a PITA — using an PHP and an include, with a call to a function to do the work when the page is served up makes it REALLY easy, and the page remains maintainable for its original author.

    * I find that the one address per source method is viable, but it comes at a cost when you operate your own servers: retired addresses are still the target of spammers, and eventually, your server gets TENS OF THOUSANDS of message delivery attempts each week for addresses which are no longer used. The solution? If you manage your own mail and DNS, you create new subdomains and use them as part of your temporary address plan. user@A07.domain.tld and user@B07.domain.tld, etc. Every say, 6 months, you create a new subdomain, set the mailserver to accept mail for that domain, and start using it for these temporary contacts. At the next interval, you create the next host in sequence (say, A08 this coming January), and nix the one from two intervals prior (i.e. not issued for six months, if that’s the interval). When some spammer attempts to send mail to this NON-EXISTENT HOST, their mailhost performs the DNS lookup, and it fails – instant failure AT THEIR END. There’s never a connection to your mailhost, since the lookup itself failed.

    Of course, using a freemail account of some sort for most “casual” email is a lot less work. Use one of your several “keeper” accounts with trusted contacts, and your tosser hotmail or yahoo one for forced registrations.

    * Dictionary attacks are easy for spammers to employ – take a bunch of common names and prefix the domain with them – Ann@domain.tld, Alice@domain.tldDave@domain.tld, Joe@domain.tld, etc. So, as nice as it’d be for you to have the simple name at your vanity domain, it’s a good bet the spammers will be all over it even if your address is never published on the net and isn’t harvested by malware.

    * If you have webhosting (or run your own servers), pass on using a wildcard email – wherein ALL mail to the domain is delivered to your mailbox (or, in the case of multiple mailbox setups, where you have a “catchall” mailbox for anything not specifically set up). This invites all manner of spam because the spammer doesn’t have to have a legitimate email address – ANYTHING at your domain will be delivered. Dictionary attacks and even *MESSAGEIDS* will deliver. (many email clients generate an email address-looking string for a messageid: some unique code @ mailhost.tld – and if the mailhost is your domain, and you accept wildcard mail, you can expect to see messages delivered to 46F99967.7010903@domain.tld and 3c06a2480709251645y76fa3581mfbc9d1486e6688d6@domain.tld and the like, because the harvesters are pretty braindead and tend to harvest anything that resembles an email address.

    Lastly, being retentive about mail filtering always helps. I have filters to organize all of my incoming mail – different lists, friends, family, etc. Pretty much anything left in my inbox is either some initial contact from someone I don’t know, or it is junkmail. Because I have effective junk filters on my mailserver, I don’t generally see a lot of junk in my inbox, but when it manages to get through, that’s where I’ll generally find it – all alone, and segregated from the mail I actually care about.

  7. mr_azri says:

    Thanks for sharing this post.I would love to read more of your thoughts in future. Anyway I’m into quality management, so visit my site.