Back to all articles

Product

Introducing TrackerDB: Ghostery's Open-Source Tracker Library

Introducing TrackerDB: Ghostery's Open-Source Tracker Library

When you visit a website, it will likely load some content from other domains that are not part of the website itself. These are called third-party requests.

They may be used for various purposes, such as showing ads, tracking your behavior, or providing features.

The entities that own these domains are called third-party companies or organizations. They may have different interests and policies than the website you are visiting. Sometimes, they may collect and share your personal information without your consent or knowledge.

That's why Ghostery not only blocks ads, trackers, and other annoyances, but also shows you how online tracking works by naming all the organizations that participate in data exchange over the internet.

We've done this for years in our Tracker & Ad Blocker by identifying third-parties when you visit a page.

Now, we would like to take the next step in “tracking the trackers” by opening up our library to our Contributors and the wider content-filtering communities.

What is the Tracker Database?

Our team believes in empowering our users to take control of their online privacy. Not only does Ghostery block and neutralize trackers on a page, our browser extension is able to identify third-party actors using our tracker database, TrackerDB.

TrackerDB allows you to distinguish trackers from non-trackers. Our database is used in the Ghostery extension to categorize all requests you see while browsing the web.

We utilize its library to publish the WhoTracks.me website, the world's largest statistical report on tracking online, and our Trackers Preview for search, which displays how many trackers were detected on the website even before you visit a page.

Image shows Ghostery Tracker Wheel, which appears as an icon in Google search results.

With Ghostery, you know who has access to your data and can better understand why tracking protection is important.

How does TrackerDB work?

TrackerDB is a curated list of filters, similar to filter lists like EasyList. The difference from other community lists is that it includes the name of the company/organization behind the tracker and judges its intention by assigning it a category.

TrackerDB provides metadata (information) about the trackers online. Our database names the entities behind trackers and lists their website URLs and privacy policy links.

We then classify the trackers into different categories providing a basic understanding of their role and functions.

To detect trackers, TrackerDB contributors have to manually curate two important pieces of information: the domain names of the trackers and the network filters used by ad blockers to block them. For example, `||doubleclick.net^$3p` blocks DoubleClick if it is a third-party on a page.

This syntax is the same as trackers are described by the community lists. Building on established standards enables existing ad blocker engines to process the patterns, and it lowers the entry barrier for contributors who are already familiar with filter lists.

TrackerDB SDK is a fast ad blocker engine that can identify known trackers by their requests. It first matches the requests with network filters and then checks the domain names against the TrackerDB. This domain matching approach is very broad as it can flag a known tracker even if it is not tracking in that instance. We find acknowledging third-parties on the page very useful for promoting transparency on the web.

In short, TrackerDB can be used as a tool to analyze which companies are present on any website.

Not Every Entity is a Tracker

Not everything in TrackerDB is a tracker. One example is the Hosting category. Ghostery may tell you that the web site is hosted on Amazon Web Services (AWS). That doesn't mean it is tracking you, but it may still be useful information.

TrackerDB isn’t always intended for the same use cases as community filter lists. The rules within our database help categorize requests. Some of these categories are tracking, but others are not. However, they provide useful information to understand what data a website shares with other parties.

Within TrackerDB, we chose the more generic term “pattern” instead of “tracker” to avoid confusion. We still used TrackerDB for the whole project because it conveys the right intuition.

The Difference Between TrackerDB and Community Filterlists

Community blocklists aim to block site requests and hide UI elements on the page, while TrackerDB helps you to understand what data is generally sent out.

Why Does This Matter?

Instead of looking at obscure requests, Ghostery can list entities (organizations) and tell you what they do and who they are. This effectively simplifies and demystifies all the trackers. New entries for TrackerDB will appear categorized in our Ghostery extension as well as WhoTracks.Me.

Ghostery, like other content blockers, relies on community blocklists such as EasyList to determine what should be blocked. The power of the community-driven approach is based on consensus — community list contributors need to agree which trackers should be blocked. Ghostery seeks the same consensus for the TrackerDB: A wider community built knowledge base, functioning as a public source on online tracking and including a layer of metadata on top of community lists.

By open-sourcing the TrackerDB, Ghostery is further disseminating the knowledge on how the tracking ecosystem works.

Entry Example

Let's have a closer look at the TrackerDB data.Take TikTok Analytics, for instance. It is a popular tracker used for site analytics and has been gaining traction over the last year. It's also a simple example, since they don't try to obscure their presence: Everything is hosted under `analytics.tiktok.com`.

If you take a look at its TrackerDB entry, it matches what is there:

A screenshot of TikTok Analytics entry in TrackerDB.

Each pattern also links to a company or organization. Here, it is ByteDance and it also has an entry with information in TrackerDB.

Other examples in our database can be more complex. For instance, in our earlier example, DoubleClick is not only operating on `doubleclick.net` but hosted in different locations.

Get Involved

We invite you to join us in improving TrackerDB and making online tracking more transparent for everyone.

You can contribute to TrackerDB by helping us identify unclassified trackers on Github. Submit an issue and provide us with information about the tracker, such as its name, website, category, and privacy policy.

To get started, use the Ghostery extension and look for trackers in the Unidentified category. Submit the blocked URL and your guess of the tracker’s owner to the TrackerDB GitHub repository. If you are not sure, don’t worry! The TrackerDB community will help you identify the tracker correctly.

This helps Ghostery move unknown trackers to one of our fifteen domain categories.

New entries for our database are included in the next Ghostery extension release, and will appear  in the WhoTracks.Me statistical report at the start of the following month.

We also appreciate feedback on how to make it easier for people to contribute. Feel free to drop suggestions to our Support team or start a new topic on Github.