Data Collection

Learn more about how Pathmatics delivers the most complete picture of the digital ad landscape possible

Collecting Our Crawler Data

Pathmatics collects a sample of digital ads from the web. In order to report the most complete picture of the digital advertising landscape, we rely on two leading data sourcing technologies: panels and crawlers.

Our crawlers discover ads by visiting sites similar to how users visit sites. We visit randomly
chosen URLs from the top visited URLs on each site. Each visit emulates the depth of typical
user visits appropriate for the site.

Website sampling

We sample each site based on traffic to keep sampling rates low enough for minimal impact on the
advertising ecosystem. This ensures we capture a representative sample of actual ad activity while
remaining respectful web citizens.

Actual numbers vary, but crawling ranges from 100 times per day for the smallest sites to a couple thousand times per day for the largest sites. Our sampling frequency enables us to capture new advertisers and campaigns.

For example, a very popular site like yahoo.com is sampled more frequently than a less visited site like healthcareitnews.com.

Scale

Pathmatics crawls the the most important publishers globally – including the top ~4,000 US sites
based on estimated traffic.

Pathmaticsʼ customers may request new sites to be tracked; since our founding, users have requested 8,000+ additional sites for total US coverage of ~12K+ sites (20K+ globally). These sites represent the important publishers in all covered regions. The sites in each region are updated regularly to reflect trends and changes in the advertising landscape.

Geographic diversity

Pathmatics samples from a geographically distributed set of locations to ensure a representative sample of geographically targeted campaigns. We crawl from nearly ~100 metros globally. The regions and metros we crawl are dynamic based on what we believe will yield the most representative sample possible.

On a given day, we crawl a range of metros across each region.

Our crawlers sample the web probabilistically – at a frequency determined by each metro's population.

For example, more pages are sampled per day from New York City, NY than Boise, ID.

Media

Similar to how real users interact with various media, Pathmatics crawls from a variety of different devices (desktops, tablets, and mobile phones), browsers (e.g., Safari, Chrome, etc.), and operating systems (e.g., Android, iOS, etc.) to ensure representative capture of advertising activity.

Was this article helpful?

Have more questions? Submit a request