I was shuffling to catch up with news waiting for Google I/O 2016 to start (which was 1AM my timezone), while an idea just popped up: let’s build an ad blocker to browse news in my phone without the unwanted distraction!
Some brainstorming needed here. We’re gonna need to prevent
WebView from loading ads, or unwanted resources, when it tries to load a webpage. A little digging into
WebView documentation leads us to
WebViewClient1. We can use
shouldInterceptRequest() to intercept each request issued by a webpage, check its URL and decide whether we want to load resources from that URL.
Now how do we identify if resources from a URL are potentially ads? Let’s check how popular ad blockers like uBlock Origin or AdBlock do it: they both have a few black lists of things to filter. Easylist, EasyPrivacy, etc are some well known ones, but are overkill for our needs: they specify sites with CSS selectors, while we only have a URL to work with here. pgl.yoyo.org list2 used by uBlock Origin seems to be promising though: it generates all hostnames considered ad servers. Now we only need to match blacklisted hostnames with our URL!
A summary of what we need to do:
- Get the list of ad hostnames from pgl.yoyo.org
- Save the list somewhere, load it when application starts
WebViewClient.shouldInterceptRequest(WebView, String)to intercept requests
- Check if the request URL belongs to one of the hostnames in the list and override it, returning a dummy resource instead of the actual one, which is supposed to be ads
Getting list of ad hostnames
pgl.yoyo.org site provides a few options to generate the list. Since we only care about hostnames without IP addresses, let’s choose
plain non-HTML list -- as a plain list of hostnames (no HTML) with
no links back to this page (we should accredit it somewhere else of course).
This will give us a list as follows:
Load ad hostnames into memory
We can either save this list to a file, include it as an asset, or as a raw resource in our app3. In either case we will have to do I/O operation to read from this file. Let’s pick asset.
Loading from file is simple. Okio is used below, but it can be replaced by
java.io APIs. One thing to keep in mind is we should do I/O operation in background thread. A simple
AsyncTask will do. Here we load directly into a static
Set variable, which would persist in memory as long as the app process runs, but let’s keep it simple here.
Next step is to intercept
WebView’s requests4 to check if they should be overriden. The logic below caches previously checked results from the same session so we don’t end up rechecking the same URL.
Match domain and override resource
Last step is to implement
AdBlocker.createEmptyResource(). The latter one should be straightforward. The interesting bit now is how to match a full URL with the list of hostnames.
Let’s consider ads from Google Doubleclick network: it has URLs with hosts such as
googleads.g.doubleckick.net. We have one single entry in our list that may match -
doubleclick.net. Our strategy here would be to extract the host from URL, walk up the sub-domain chain, try to match the whole sub-domain first, then keep stripping off the sub-domain until we exhaust or find a match.
Check out a demo below:
That’s fun! Add an adress bar, a progress bar, a few standard browser buttons and you now have an ad-free Android web browser, built by yourself! The solution is not as comprehensive as uBlock Origin or AdBlock, but it should remove enough distraction.
A complete implementation can be found on Materialistic’s Github repository:
WebViewClientwill be called when things happen that impact the rendering of the content, eg, errors or form submissions. You can also intercept URL loading here (via
shouldOverrideUrlLoading()). - developer.android.com ↩
WebViewClient.shouldInterceptRequest()is only available from API 11, and has been deprecated since API 21. The newer version currently wires up to this one, so implementing one should be sufficient for now. ↩