Augmentality Logo

Case Study: A Better Way to Collect Web Data

4 min read
Finding information is easier with automation

Free up resources for work that benefits from a human touch.

TL;DR: By automating a client’s manual data collection process using web scraping bots, we saved them significant time and money while improving data reliability.

The Pain Point

A client once came to us with a problem. She needed to make a strategic decision about how her team would invest in local marketing initiatives for the several thousand franchisees in her organization. As a market leader facing growing competition, a challenging regulatory environment, and competing financial priorities, the team needed data to make a targeted decision. Our client wanted to know which of her locations faced the strongest competition from two market challengers that had been investing heavily in media over the previous several years.

The Solution

This wasn’t the first time my client had faced this problem. In the past, she chose brute force. She assigned a team of 8 people to click through the competition’s websites and copy locations manually into a spreadsheet. The exercise took 100 resource hours and cost around $4,000.

How many of us would do the same?

Faced with a problem, we tend to pick the most efficient solution that we can most clearly understand. After all, the risks of trying new methods are rework and wasted money. But our client thought there might be a better way, having spent some time with us on recent projects.

So, instead of sending a platoon of her team’s resources to click away robotically for a few days, we decided to let them keep working on the higher-value jobs they were much better at, and we built some robots.

How It Worked

You’ve probably heard of ‘bots’. They come up most commonly in talk about fake social media accounts, and maybe in the context of website chatbots. Sometimes there’s a negative connotation to the term, but ‘bots’ are just computer scripts that work using data on the internet – and there are loads of cool and interesting ways to do that.

A Web Scraper is a script that copies information from pages on the Internet. Now, you have to be careful. You can’t go throwing a scraper on every website – many platforms have rules about when, where, and how scrapers can be used to collect their data. Having reviewed the rules on the competition’s website and determining that we were in the clear, we built a scraper for the simpler site first.

The site had a list of linked state names, each of which led to a list of linked city names in that state. The city links led to a table of location addresses in that city. Our little robot happily clicked through each city and copied the relevant information from the tables, taking about four seconds per page. The process took 9 hours, running quietly in the background through the afternoon and into the evening as I made dinner and put the kid to bed. In the end, we had a list of over 27k locations.

The second site shared a lot in common with the first, but with a few differences in how the tables were organized and how some city links behaved. Using the first scraper script as a foundation, a few adjustments and some extra if/then statements set us up with a scraper that ran faster than the first, knocking out over 8k locations in just two hours.

The Results

Instead of taking her team off of strategically valuable work to spend $4k worth of their time on mindless tedium, our client paid for a few hours of development and had 35k data points to fuel her strategic decision the following afternoon. Ultimately, she spent a fraction of the cost and time, and the data was more reliable and consistently well-structured, allowing for easy formatting and loading into existing databases.

Does this problem sound familiar? We would be happy to take a look at any manual processes you’re devoting resources to and help you think through where you might find efficiencies hiding in plain sight at your business. You will be shocked at how many resource hours you can free up for work that truly benefits from a human touch.