Or how to block the entire Facebook network.
In my last post on Facebook’s misfortunes I mentioned that my wife initially blamed me, assuming it was just local and that I had made some new change to my local network configuration. Now whilst I do actually bin some of Facebook’s more annoying subdomains (such as the stats collector at staticxx.facebook.com) along with similarly annoying google domains like google-analytics.com and adsettings.google.com I had not completely blocked the whole of facebook.
Well, not until now that is.
As I have pointed out in the past, my network is segmented into three parts, an outer network connected to the big bad internet, a dead network between my pfsense firewall and my internal network and the internal network itself which houses my internal servers, desktops and (some) mobile devices. All three networks have (different) RFC1918 addresses and my routers do NAT at the boundaries. It looks something like this:
This segmentation allows me to to offer different security policy stances on each of the two main networks, each of which hosts its own DNS/DHCP server. Of course it also allows me to route all my internal network traffic over a VPN to any one of my (several) VPN servers as I see fit because the pfsense device acts as a VPN concentrator. My internal network has a much stricter policy stance than does the outer one.
As I mentioned in my post on unbound back in June 2019, I use Simon Kelley’s excellent dnsmasq for my DNS (now chained to stubby as described here.) One of the main reasons for using dnsmasq is that it provides an easy mechanism for blocking systems based on their DNS names. The dnsmasq configuration file allows you to specify “additional hosts files” which it will read at startup along with your local hosts file. As the dnsmasq man page says:
It is possible to use dnsmasq to block Web advertising by using a list of known banner-ad servers, all resolving to 127.0.0.1 or 0.0.0.0, in /etc/hosts or an additional hosts file. The list can be very long, dnsmasq has been tested successfully with one million names. That size file needs a 1GHz processor and about 60Mb of RAM.
I have long used this option to point to a list of undesirable sites maintained by Dan Pollock. I periodically pull the latest file and install it as /etc/hosts.block.pollock alongside one of my own at /etc/hosts.block.mick. It is that latter file which currently contains lines like:
Note that I have to specify google-analytics twice because, unfortunately, hosts files do not allow for wildcards (so for example you cannot specify *.example.com and expect it to block the entire example.com domain). This is partly why Dan’s file currently runs to over 17,000 lines. He lists a domain and then any subdomains (“www” is typical) which must also be blocked. (As an aside, I know that Dan’s file contains google-analytics, but by default it is commented out. I prefer to keep a separate file of my own “bad guys” so that I know it works.)
After my wife blamed me for blocking the entirety of Facebook I got to thinking “why not?” After all, I’m fed up of Zuckerberg’s creepy network tracking me all over the net. So I went looking for a list of Facebook’s domains which I could add to a file called /etc/hosts.block.facebook. I quickly found a couple – one by Jonathan Duggan and another by Anudeep. Jonathan’s file lists around 2.000 subdomains whilst Anudeep’s lists about 4,000. What is immediately noticeable about both lists however, is that there is much repetition of the top level domains – so for example Jonathan’s list looks in part like this:
Now if I were to use a list like that, I would forever be worried that Facebook had listed a new subdomain on one of its (multiple) top level domains and I would have to continually play “whack-a-mole” in order to stay on top of the bastards. Worse, neither of those lists actually include many of Facebook’s domains on country code TLDs or indeed some of the other TLDs now in existence. For example, Facebook owns many domains which (I’d guess) it would much prefer it did not have to (such as facebook.sucks, facebook.adult, facebook.sex etc). In my view it would be much better to be able to wildcard the domains so that I could have a single line blocking the entirety of facebook.com, tfbnw.net and so on, safe in the knowledge that if they added any new subdomains those would be caught too. Fortunately, dnsmasq allows us to do exactly that.
As well as allowing us to point to (multiple) additional hosts file, the dnsmasq configuration allows for formulations of the form:
As the dnsmasq manual says:
Specify an IP address to return for any host in the given domains. Queries in the domains are never forwarded and always replied to with the specified IP address which may be IPv4 or IPv6. To give both IPv4 and IPv6 addresses for a domain, use repeated –address flags. To include multiple IP addresses for a single query, use –addn-hosts= instead. Note that /etc/hosts and DHCP leases override this for individual names. A common use of this is to redirect the entire doubleclick.net domain to some friendly local web server to avoid banner ads.
So if we include the line:
then the /entire/ facebook domain gets blackholed to local loopback. Way to go Simon!
Now in order to build my list I needed to know what domains facebook was likely to use. So assuming that they would use every country code (where they can) and most of the other TLDs (again where allowed – I don’t think they could get away with facebook.amazon or facebook.google) – I pulled the current TLD list from IANA. With that list in hand I began checking whois records and sending ICMP echo pings to addresses of the form facebook.tld – but that game got old very quickly so I decided to just block the entire list. So a couple or three sed scripts later (to prepend “address=/facebook.” and append “/127.0.0.1”) I had the list I wanted. It is now in place on my inner network and I can now relax safe in the knowledge that Zuck can no longer track me. In the spirit of Dan Pollock’s example, I make the facebook block list available at zuckoff.net. Feedback is welcome to email@example.com.
Of course, my wife now uses the external network…….