sharing software, thoughts & experiences

Protect your email and phone number from spam bots

By Nelson Pires
Updated on

This is a recurring problem, you display your email address and/or phone number in your website and sooner or later that information is harvested by spam-bots, so it's important that it's not visible to them, yet, must be visible to your users. This is the challenge I'm addressing in this article.

Over the years I have been using a variety of techniques to hide email addresses and phone numbers from spam bots. Some of these techniques started as simple as using an image as opposed to text (sometimes auto generated on the fly) or by using some kind of text encoding, but that's no longer sufficient.

Why do we need to do this?

Well, there's some people out there that will stop at nothing to get their hands in your email address and phone number.

Spam bots are automated scripts that consume a web page text/html and look for patterns that would match an email address and/or phone number. Since this is mostly available as text, its very easy to get to. Some people use an image but this too can be overcome with OCR (Optical Character Recognition), so the spam-bot would download the image and analyse the pixels in it, looking for those that resemble letters or numbers. One way to minimise the risk with images is by introducing noise in the image (lines, polygons, backgrounds, etc) in an attempt to confuse the OCR mechanism. Note that most modern spam bots no longer acquire your web page text from the view-source, they can now get the full DOM, like a browser, so, if your email address or phone number was injected in the DOM with JavaScript, you are no longer protected.

When spam-bots get your information, it will be validated and included in databases that are then sold to other companies and these in turn will make use of the data internally and also re-sell these databases to yet more companies. So it's easy to see the scale of the problem and this usually manifests itself in the form of email SPAM and unwanted phone calls. Everyone MUST hide this information from websites, still it must be visible so legitimate users can access it as needed.

How can we achieve that?

As mentioned above, there are different techniques that can be used, some better than others, but the one I now use the most is a mixture of pure text and CSS.

For example, consider these details:

Email: [email protected]

Phone: +351 123 456 789

Pretty standard right? It is fully accessible to spam bots.

Here's my technique:

HTML<p><span class="rev email">emos</span></p>
<p><span class="rev phone">321</span></p>
CSSspan.rev { unicode-bidi:bidi-override; direction:rtl }
span.rev.email:before { content:"moc.etisbew" }
span.rev.email:after { content:"@nhoj :liamE" }
span.rev.phone:before { content:"987 654 " }
span.rev.phone:after { content:" 153+ :enohP" }

Which results in this:

321

Same display as above but not accessible to spam bots.

Pretty cool hum! It's now obfuscated. There's only one drawback. The user can no longer select the email or phone number, only the characters in the HTML can be selected (try above). This is a small price to pay in my view.

The magic happens in CSS unicode-bidi:bidi-override which overrides the browser algorithm for bidirectional content and together with direction:rtl reverses the text. Note; it has to be applied to an inline element and browser support seems to be good.

To further confuse the bots, I purposely leave some info in the HTML, the rest is in the CSS. CSS will then prepend and append to the HTML element with the :before and :after pseudo's. This is better than if all the info is in the CSS.

Let's make sure this technique persists

First you shouldn't use the class names as they are above. If everyone uses them, bots will learn and adapt fast, so, instead of rev email and rev phone call it something totally irrelevant, like zys obelisk or something like that, you get the point. Every implementation should use different class names. This is key to ensure bots have nothing to hone in and reverse engineer our work, that would be disastrous.

So, there you have it, a pretty cool technique I think, one to make the bots go hungry (hopefully for a long time).

Happy coding and please share your thoughts.


Go
Top