Why CloudFlare’s e-mail protection is no longer safe

Anton Vroemans
4 min readNov 15, 2020

--

Spam is a huge problem. Luckily, spam mail has been decreasing since 2008. Also, spam filters have gotten more advanced than ever over the years. Nevertheless, there are still billions of spam e-mails being sent every day. When you end up on a mailing list, it is near to impossible to remove yourself from it. Your precious e-mail address is being tossed around between spammers, and there is no turning back.

With this in mind you’ll do everything to prevent your e-mail from leaking out. One of the most fundamental things to do, is to not put your e-mail in the open online. When you put it somewhere in plaintext, it won’t take long for a scraper to find it. Fortunately CloudFlare has a solution for this.

CloudFlare’s e-mail obfuscation

In the CloudFlare dashboard, you have an option to obfuscate e-mail addresses.

CloudFlare provides the option to obfuscate (the action of making something obscure) any e-mail addresses it finds on your website. It does this automatic when you enable this option in the dashboard. An obfuscated e-mail might look like this:

<span class="__cf_email__" data-cfemail="36534e575b465a53765b575f5a1855595b">[email&#160;protected]</span>

It also includes its own JavaScript /cdn-cgi/scripts/5c5dd728/cloudflare-static/email-decode.min.js to deobfuscate the e-mail and show it to users when they visit the website. This way bots can’t read the e-mail address, but visitors that have enabled JavaScript will just see the e-mail as if nothing happened.

How it works

This part explains how CloudFlare obfuscated the e-mail addresses. It contains some code and boring explanations, so if it doesn’t interest you, you can skip it.

Of course CloudFlare has minified its decode code. After all, it has to transmit it millions of times every day. With the help of jsnice.org I was able to format the code, and rename all the variables to names that made sense. You can find my formatted code here.

Whilst the minified code seemed pretty slim, it turns out the code is bigger than it looks. The script looks for e-mails in anchors, elements with a specific class, and also makes sure to not skip `<template>` elements as well. It decodes the e-mails and makes them clickable. After it has done this on the whole webpage, the script removes itself.

Decoding the e-mail key

function parseHex(string, position) {
var result = string.substr(position, 2);
return parseInt(result, 16);
}
function decode(href, startPos) {
var result = "";
var key = parseHex(href, startPos);
for(var position = startPos + 2; position < href.length; position = position + 2) {
var byte = parseHex(href, position) ^ key;
result = result + String.fromCharCode(byte);
}
return result;
}

This is the script used to decode e-mails. As the example earlier showed, the e-mails get obfuscated to a HEX string like this one: 36534e575b465a53765b575f5a1855595b. We can deduce that the HEX code represents bytes from 0 to 255, with the first byte being a key. This key is used to decode all the other bytes with a XOR-operator. Encoding an e-mail comes down to almost the same script, since the XOR operator is also used to encode a string. I created this encode-script that encodes an e-mail with a random 1-byte key.

function encode(mail) {
var key = Math.floor(Math.random()*256);
var result = ('0' + key.toString(16)).slice(-2);
for(var position = 0; position < mail.length; position++) {
var byte = mail.charCodeAt(position) ^ key;
result += ('0' + byte.toString(16)).slice(-2);
}
return result;
}

What is the problem?

Most bots won’t recognize this trick or care enough, and indeed skip these obfuscated e-mails. But since the same obfuscation script is used for so many websites (keep in mind that 15% of the whole internet is using CloudFlare in November 2020), it creates a flaw.

All a spammer has to do is update their crawling-regex to also look for CloudFlare mail hashes, instead of only basic e-mails. They can easily decode all these hashes. In fact, this has already been done. Publicwww.com is a search engine that lets you search plain HTML-code. When we search for the CloudFlare mail script, we find already 361.107 results. This means, there are 361.107 e-mails laying out to be grabbed already! All you’d have to do is download their list, create a script to retrieve all the webpages, and decode the e-mails. Boom, you now have your own spam list!

What can I do about it?

The biggest flaw is the widespread use of the same encoding script. Spammers can’t account for all the different obfuscate-scripts out-of-the box, but this one is definitely worth it. For my own website I created my own custom encoding script. This will prevent this specific attack, but is still relatively easy to circumvent. Some -more advanced- spiders can render JavaScript. Google shows you doing it in action. When entering a link you can see the rendered HTML, containing my actual e-mail address. Most spiders aren’t doing this yet, since this is way more CPU intensive.

The real solution

Bots have become outstanding at mimicking humans. When you want to make sure only real visitors can see something, the only good solution seems like using a CAPTCHA. MailHide is a free and simple service for locking your e-mail behind a CAPTCHA. Of course this suppresses the user experience a little, but it might be worth it if you can prevent your e-mail of ending up on a spam list

First article.

This is the first blog post I’ve ever written. I’d like some feedback or claps, I don’t really know how this all works but I hope it was an interesting read. My primary interests are programming and tinkering with electronics. If those match yours, you can follow me and might find more posts someday.

--

--

Anton Vroemans

I write mainly about security and programming. I look for effecient solutions to problems. Programming and electronics are my passion.