Editorial

Tracking The Trackers 2020: Web tracking’s opaque business model of selling users

Elaine Christie

The internet is not free. Yet billions of people unknowingly trade their data while they browse in what they think is a free ecosystem.

Trackers are lurking everywhere. Web tracking — by the likes of Google, Facebook, Amazon, and a host of other players — has become so pervasive that it’s almost impossible to avoid. A lot has changed in the three years since we conducted our inaugural Tracking the Trackers study in 2017. Consumer privacy worries have reached an all-time high, manifesting into everything from unprecedented federal reports and lawsuits, to pop culture attention on the issue and promises to address it as a political selling point.

Digital Data Privacy: From Bad to Terrible

Tech companies profit by being in the marketplace of human activity. They track almost everything done online — every scroll, keystroke, search query, and click. That data is then used to make predictions — companies build models that predict people’s actions and then tailor what kind of information and ads to show them.

Our mission is to empower consumers to protect their personal data. The purpose of our 2020 Tracking the Trackers report is to break down which companies are tracking users the most — where, when, how, and why. By design, these shapeless, hidden figures are lurking and recording people all across the web. There’s almost no transparency with what the companies record.

Our updated data is based off of 1.96 billion page loads. It brings to light that in the last three years, the digital data privacy environment has gone from bad to terrible and consumers’ best means of protection is taking their privacy into their own hands and utilizing privacy-first internet alternatives.

Top Company Tracking Reach

In 2017, our Tracking the Trackers report revealed Google and Facebook were the top-tracking leaders with a share of 64.4% and 27.1% reach, respectively. They were followed far behind by ComScore (11.4%) and Twitter (10.5%).

This year, our research reveals a new top tracking player who has largely managed to stay out of the critical big tech privacy spotlight thus far: Amazon is the new one to watch!

In 2020 in the U.S., Amazon surpassed Facebook in tracking reach, clocking in at 29.4% tracker reach, with Facebook at only 23%. In the EU and globally, Amazon remains just behind Facebook, ranking 3rd most widespread and coming in at 17.2% reach in the EU and 19.2% globally.

In 2017, Amazon trackers appeared on only 10.5% of all page loads, 5th in the top US ranking behind Google, Facebook, comScore and Twitter.

The data shows Amazon deserves much more media and consumer scrutiny when it comes to its data privacy practices. Especially as the company has grown beyond its primary business into sectors like grocery and pharmacy where sensitive data is all the more prevalent. But what, exactly, did our research count as an Amazon tracker? Here’s a good summary of the different tracker categories. For Amazon there are seven different trackers included:

  • Amazon Advertising – Advertising Tracker – provides advertising or advertising-related services such as data collection, behavioral analysis, or retargeting.
  • Amazon Cloudfront – Hosting Tracker – this is a service used by the content provider or site owner.
  • Amazon Web Services – Hosting Tracker – this is a service used by the content provider or site owner.
  • Amazon CDN – CDN – content delivery network that delivers resources for different site utilies and usually for many different customers.
  • Amazon Payments – Customer Interaction Tracker – includes chat, email messaging, customer support, and other interaction tools.
  • Amazon.com tracker – Misc Tracker – prevalent on IMDB, Twitch, and other Amazon-owned companies.
  • Amazon Associates – Advertising Tracker – provides advertising or advertising-related services such as data collection, behavioral analysis, or retargeting.

Undisputed King of Data?

Our research proves that Google remains the undisputed King of Data. While Amazon has grown its tracking ecosystem at a distressing rate and Facebook continues to reign as a top data collector, Google is massively outpacing its big tech brethren in data collection.

Our new data reveals that globally, Google retains tracking reach on 80.3% of all websites. That number grows to 81% in the EU and declines to 79.5% in the US. The contrast is stark when comparing to our 2017 Tracking the Trackers study which found Google to have a 64.4% reach on US sites.

These alarming figures also align with the more intense federal pressure Google has faced this year in what has so far amounted to a significant antitrust report and an unprecedented lawsuit. While both focus primarily on antitrust concerns, the issue remains inextricably linked with its ability to gather data as consumers face a lack of more private browsing options.

Beyond the media giants noted above, it’s also worth examining a company new to the 2020 report: Index Exchange (formerly Casale Media) is a global advertising agency whose “exchange” platform lets advertisers and data collectors work together to create and sell targeted ads. Index Exchange collects a range of data, including ad views, browser information, cookies, date/time, hardware/software type, interaction data, Internet Service Provider, and page views. Their presence is a reminder to consumers that it isn’t only the known, big-tech names we have to fear in the fight for data privacy. Participants like Index Exchange feed data into an ever-expanding web of companies and advertisers worldwide.

The GDPR Tradeoff

GDPR may be asking consumers for consent, but tracking is only growing. As consumer concern around privacy has grown in the United States, lawmakers have looked to the EU as a model of modern privacy regulation with GDPR. However, the research supports that GDPR may not be having the full intended effect.

Privacy experts have long speculated that the blanket consent it requires does not actually allow consumers to understand what data they’re giving away and how it will be used, but the research now shows that the EU may actually be in a more vulnerable data state than the US. In the EU, both Google and Facebook tracking technologies are prevalent on more sites than in the U.S.

First, it’s worth noting that this section is based on the top 6,000 most trafficked sites globally, in the United States, and in the EU; these figures aren’t based on page loads, which is what Tracking Reach above uses.

In the EU, Google trackers exist on 87.5% of websites compared to 86.4% in the U.S. For Facebook, 60.2% of EU sites contain Facebook trackers compared to 59.7% in the U.S.

(*Methodology: In this section, we’re looking at how often trackers show up on specific domains at least once rather than looking at how often they show up on web traffic overall.  This is why the Google data and FB data is much higher than the share of traffic that we see above: 80% and 19%, respectively.)

More specifically, when looking at most prevalent trackers individually, some of the largest like Google Analytics are far more prevalent (on 42.7% of EU sites compared to 39.1% of US sites).

However, the U.S. does pull ahead when examining the sheer amount of website trackers. When considering sites with more than 20 trackers, the U.S. clocks in at 4.45% and the EU at 3.2%.

What does this mean? While Google and Facebook are continuing to dominate in data collection in the EU, it’s the smaller companies and trackers that are likely getting left behind. So while GDPR may be cutting down on tracking overall, it’s only further empowering the biggest players. For consumers, this means that while their data may be accessible to less companies overall in the EU, more of it is getting funneled to the big tech companies that already know the most about you. For many, this is a trade-off you’d rather not make.


“In a move that is ostensibly intended to help publishers be more privacy-compliant, Google has authored a literal playbook that guides publishers to create more direct relationships with its users and rely less on third-party tracking technology,” explains Jeremy Tillman, president of Ghostery. “While this move might help companies avoid paying fines and running afoul of GDPR, it also conveniently helps Google consolidate its power in AdTech and MarTech by nudging publishers, not so gently, towards the Google technology suite.”

Top Trackers Across the Web (ranking top 10-20)

Based on our research looking at tracker reach, this year saw big increases in google_users and amazon cloudfront and decreases for Twitter and google_adsservices. Trackers in the EU have less reach than the U.S. — with eight trackers in the EU showing double-digit reach vs. 12 trackers in the U.S.

Still, when breaking down the most prevalent trackers across the globe, Google dominates with a 90% stake in the top-10 list: google analytics, doubleclick, gstatic, google tag manager, etc. Spots 11-20 for trackers are all digital advertising companies:

Although Google averages $120 billion per year in ad revenue, it stands accused of being:

  1. A “voyeur” of data collection activity, according to a federal lawsuit against Google earlier this year. Court documents say consumers’ detailed browsing history data was collected and monetized by Google without their consent. The lawsuit accuses Google of violating federal wiretap law and California privacy law by tracking users, despite the fact that they had gone in and updated their account privacy settings. Much of this data-collection activity is centered on Google’s Firebase database software, which is popular with app makers.
  2. A bit misleading with consumers’ locations and whereabouts. In light of an Associated Press report showing that location tracking was mishandled, a Google engineer admitted that  “location off should mean location off, not except for this case or that case.” The AP report had found that when someone turned off location data on an iPhone, Google Maps and other apps continued to store customers’ whereabouts in an area called My Activity (separate from Location History), allowing ad buyers to target ads to specific locations. So even when users thought they’d turned off location, advertisers were tracking them to sell targeted ads and “local campaigns” to boost in-person store visits.

As it’s now clear, lots of industries are collecting and using consumer data — entertainment, news, recreation, business — and this kind of data tracking is happening everywhere. Indeed, it’s been said that “if you’re not paying for the product, then you are the product,” since advertisers pay in exchange for influencing how consumers think.

Types of Websites That Have the Most Trackers

NEWS & MEDIA

News/media outlets retained the no. 1 spot among which types of websites have the most trackers globally. News sites ranked ahead of e-commerce sites, with news and portals hosting an average of 12.9 trackers in the U.S. and globally, and 12.4 in the EU.

It’s an unfortunate reality that the media has struggled significantly in recent years, worsened by the pandemic economy that has decimated many industries. Advertising has become one of the final profitable channels for news brands and unsurprisingly, that has led many news sites to play host to a plethora of data-collecting trackers. In order to stay afloat financially, nearly all major news outlets have also adopted native advertising as a profit-making model. Advertisers use native advertising — aka, sponsored content — to creatively blend paid messaging with news articles. By some estimates, this type of sponsored content supports 75% of media ad revenue and is a multi-billion dollar industry.

“This approach creates a situation that blends advertising and legitimate content, and the user may fail to distinguish between the two,” Tillman points out.

“For instance, a publisher may include sponsored content from an oil company that extols the virtues of fossil fuels and a visitor may think this piece of advertising is a real article written by a credible journalist when it’s anything but. In an effort to compensate for tightening privacy regulations, publishers may find themselves dipping a toe into more deceptive methods to fund their content,” he adds.

E-COMMERCE

E-commerce sites, the next most tracker-heavy, came in at an average of 9.1 trackers in the U.S., 9 globally, and 8.9 in the EU. Looking at real examples, we can see this relationship playing out when examining some of the top news and e-commerce sites:

News:

  • Foxnews.com – 60 trackers total
  • NYTimes.com – 26 trackers total
  • CNN.com – 78 trackers total

E-Commerce:

  • Walmart.com – 32 trackers total
  • Alibaba.com – 17 trackers total
  • Amazon.com – 14 trackers total

RECREATION

In the U.S., the “Recreation” category fell from #2 in 2019 to #5 in 2020. Due to COVID lockdowns and less travel overall, it makes sense that recreation websites would see a drop in traffic, and therefore a drop in traffic share for trackers. There are a couple ways to interpret this decline. It’s possible that companies are seeing less revenue and don’t want to spend money on multiple trackers. Conversely, it could be argued that they’d be more desperate to find more ways to squeeze value from data and drive more performant ads. However, in Europe, recreation trackers decreased only slightly, retaining the second-highest tracked category in the EU.

POLITICAL

Political tracking exploded in 2020 — this category of trackers experienced huge 100%+ increases from 2019 to 2020.

It’s important to point out that there isn’t one “political tracker.” Instead, this refers to the average # of trackers on websites that are categorized as “Political.” Political websites are defined as “official websites of political parties and movements” (e.g. donaldjtrump.com and joebiden.com). More than likely this spike was due to the 2020 U.S. election season and political uncertainties around the world.  In the U.S., political tracking saw a 108% jump from 2019. In the EU, political tracking saw a 121% jump from 2019.

Here are a few real-world examples of how political tracking works:

  1. Facebook offers advertising pixels, small pieces of code that an advertiser embeds onto their website. Facebook can then match the pixel information (date, time, URL, and browser type) to a user’s profile. With this profile, political campaigns can create Facebook ads and target users based on certain algorithms. Unfortunately, Facebook ads are notorious for not being fact-checked; the company is accused of not being transparent with data concerning its political ads; and people in certain demographics are targeted for propaganda.
  2. Cellular service providers collect and sell customers’ location data to third-party advertisers and marketers. Federal law does not prohibit this type of tracking, so app makers and cellular providers don’t have to disclose the information being collected and sold about customers. In the case of political tracking, geofencing technology has been used to identify and track smartphones at places of worship — advertisers then push ads in the hopes it will sway votes for candidates with similar views on social issues.
  3. There are other groups employing similar strategies to harvest data on potential voters. During recent protests around the country, for example, political groups gathered intel from protesters’ cellphones so they could then send targeted messaging about voting and other social issues.
  4. In addition to political action groups, software companies also help groups track voters across the web. For example, a firm called AnalyticsIQ uses its PeopleCore database — with “core” data on more than 241 million Americans, including age, gender, marital status, family structure, income and net worth, discretionary spend across product categories, and credit history, among other categories.
  • The U.S. has a smaller % of websites that Google and Facebook appear on vs in the EU
  • The U.S. has more % of websites with 10+ & 20+ trackers vs the EU
  • The % of websites in the U.S. and EU with 10+ and 20+ tracker decreased in 2020 compared to 2019

4. % of websites that Google trackers appear on

  • US: 86.35%, EU: 87.53%
  • Compared to 2019: US: 85.68%, EU: 87.81%

5. % of websites that FB trackers appear on

  • US: 59.65%, EU: 60.24%
  • Compared to 2019: US: 61.04%, EU: 64.19%

6. % of websites with 10+ trackers

  • US: 30.47%, EU: 29.88%
  • Compared to 2019: US: 33.67%, EU: 33.83%

7. % of websites with 20+ trackers

  • US: 4.45%, EU: 3.20%
  • Compared to 2019: US: 4.54%, EU: 3.78%

Current Regulations

While data is traded faster and more easily, data privacy regulations are moving at a snail’s pace. There’s currently no federal law protecting what companies track, store, and sell about American citizens. The California Consumer Privacy Act (CCPA) is the first and only U.S. law of its kind: any company that does business in California must implement security practices to protect consumer data. The law went into effect January 1, 2020, and mandates that companies:

  • Post their digital privacy notices in an accessible format.
  • Honor users’ Do Not Track privacy settings.
  • Clearly explain what types of information will be collected and how the information will be shared.
  • Offer a global opt-out option (to allow consumers to opt out of all sales of personal information).

The law shares elements of GDPR, which went into effect in 2018.  However, some say GDPR is far more stringent, as it covers all personal data, regardless of source; CCPA only considers data that was provided by a consumer and only helps California residents. Clearly, it’s up to consumers to take matters into their own hands.

In a recent Ghostery user survey, 78% of respondents said they weren’t okay giving companies their personal information even if it resulted in a product that’s personalized just for them. When asked about their primary reasons for using the Ghostery Browser Extension, the majority cited blocking annoying ads and popups (71%) while online privacy protection (65%) and safeguarding against data leaks (39%) came in second and third, respectively.

Looking Ahead

The internet, quite simply, runs on data. In an ad-driven world, that means tech companies and online publishers are looking to track, collect, and analyze consumer behavior online.

Regulation experiments from GDPR to CCPA have yet to prove effective in meaningfully protecting consumer data. Perhaps more alarmingly, banks and online lenders are already looking to use online data sources for determining credit scores.

How intimate will this get and how far back will they go? Might this “enhanced” credit score factor in online tracking activity by Google, Facebook, and advertisers? In an academic study about alternative credit and “social scoring,” researchers noted that financial institutions already attempt to factor a range of alternative data, including “non-financial payment streams, academic records, behavioral signals gleaned from online or social media footprints and results generated via digitized psychometric testing – and by assessing that data in relation to models of risk assessment based on the analysis of big data.”

Fortunately, several tools today exist to notify users about who is tracking them on each page, allowing them to ‘opt-out’ as they wish. This is one way to return control of personal data to individual users – even if government regulations lag behind.

Ghostery’s privacy-focused browser extension, VPN, and ad-blocking solutions halt most data-gathering altogether. The Ghostery Browser Extension will show all of the trackers found on a website. You can then choose to block all tracking technology, block only specific trackers, or mark some sites as “trusted” with a simple tap.

In summary, installing anti-tracking tools is an effective way for people to halt trackers in their tracks.

* Media Resource:
For a quick, high-level synopsis of this report, download the Tracking the Trackers: At A Glance PDF.
Tracking The Trackers 2020 - Download

* Methodology:
Data from August 2020 and August 2019 is provided by https://whotracks.me. More information about the data and description of variables can be found on their GitHub (https://github.com/cliqz-oss/whotracks.me/tree/master/whotracksme/data)Company tracking reach is calculated using page loads from Human Web data collected by the Ghostery Browser Extension. Tracking reach is defined as proportional presence across all page loads (i.e. if a tracker is present on 50 out of 1000 page loads, the reach would be 0.05). This value is a float between 0 and 1. August 2020 tracking reach is based on 1.96 billion page loads. There is no page load count available for August 2019.  Type of websites that have the most trackers, % of websites that have X trackers, and % of websites with X+ trackers are based on the top 6,000 most trafficked Global, US, and EU websites in August 2020 and 2019. These are not based on page loads. The % of websites calculation counts a tracker if it’s been seen at least once on those sites, not that they are always present. For example if sites embed YouTube videos on a single page on their site, we would still count YouTube (Google) as being present there according to this stat.A tracker can have low reach (meaning it’s not on most page loads), but high prevalence across the top 6,000 sites since it exists on at least one page on their site.

* References:
Ghostery Study - Tracking the Trackers
www.dockets.justia.com/...
www.apnews.com/...
www.internetpolicy.mit.edu/...
www.npr.org/...
www.wsj.com/...
www.researchgate.net/...