In this article, you will learn how the Bardeen scraper works and how you leverage it to save time.
With the Bardeen scraper, you can extract data from any website and send it directly to your favorite apps. This means that you can ditch copy-pasting from your day-to-day processes forever.
The scraper allows you to do things like copying LinkedIn profile data to your Google Sheets or Notion database with one click, saving interesting tweets to a Google Doc, and much more.
Scraper fundamentals
Let's break down web scraping in simple terms. Think of all the information on the internet as being stored in big digital libraries.
Websites use special tools called APIs to let you access the information stored in these libraries. An API (Application Programming Interface) is like a messenger that takes your request to the library and brings back the information you need. But not all information is easy to get through these tools. For example, when you read a tutorial on a website, it has a title, content, links, and other details. All this information is stored in a digital library, and the website shows it to you in a readable format.
However, you can't directly access the library to get the information in an organized way. But since the information is displayed on the website, we can still get it.
Web scraping is like copying specific parts of a webpage and turning them back into organized data. In short, web scraping is a method of gathering information from websites and organizing it into a structured format using special tools called scrapers.
What can I scrape?
You can scrape the following types of data.
Text
Ex: name of a person from a LinkedIn profile page.
Link
Ex: links from Google Search results.
Image
Ex: profile images from LinkedIn
Click
Clicking on the “contact info” link to open a popup.
Input
Such as filling out a form.
You can also get information that isn't directly shown on the page.
Page Link
Get the current website's URL.
Page Title
The title text that shows in Google search or as a browser tab name.
Meta Image
The preview image that appears when you share a link on social media, also called an Open Graph image.
Time Stamp
The exact time when a page was scraped, useful for tracking when you scrape the same page multiple times.
What can I do?
Scrape data on an active tab
This action scrapes data from the currently open webpage. Use it when you need to copy one thing at a time.
For example, if you're making a gift wish list on Notion from Amazon, find a product, launch Bardeen, and copy it with one click.
Scrape data on URLs in the background - Premium action
This action scrapes data from multiple links in the background which works well if you don't want your computer occupied while extracting data.
For example, if you have a list of Twitter profile links, you need more info like names and follower counts. Enter the list of links, and Bardeen will scrape the missing info. No more copying and pasting.
Trigger: when website data changes - Premium action
Instead of checking a website a million times a day to get updates, you can set this trigger to do it for you.
This trigger scrapes a website every 10 minutes and will return new information, which you can use in your Autobook to send you a notification (email, Slack, or SMS), for example.
Use this to track competitor prices, government tenders, and product availability.
Scrape data lists - Premium action
Instead of scraping items one by one, you can choose a list of items on a website, and Bardeen will scrape each list into a row of data in your preferred apps.
Scraper templates
A scraper template tells the scraper what information to extract from a webpage and where to find it. These templates only work for specific types of pages. For example, a template for LinkedIn profile pages will only work on those pages, not on LinkedIn company pages.
All scraper actions need a template to work. You can choose from one of our ready-made templates, your existing templates or create a new one. Bardeen offers pre-built scraper templates or you can choose to create your own for websites that we do not have templates for.
What pre-built scraper templates are available?
Bardeen has over 200 pre-built scraper templates for you to choose from. Simply select your desired model within the Playbook builder:
Airbnb property | Appsumo products list | Clutch search results | Google job posts search results | LinkedIn companies search results |
Airbnb search results | Booking.com property | Connect with the linkedin user | Google Maps location card reviews | LinkedIn company about page |
Amazon best sellers | Booking.com search results | Craigslist search results | Google Maps search results | LinkedIn company jobs tab |
Amazon books series | Capterra product overview | Crunchbase organization profile | Google Maps location card | LinkedIn employee count insights |
Amazon product | Capterra product search | eBay product list | Google Play app page | LinkedIn group members |
Amazon store page | ChatGPT conversation | eBay product page | Google Play app reviews | LinkedIn job search results |
Appsumo product reviews | ChatGPT last reply | Eventbrite search results | Google Play search results | LinkedIn people search |
FB Personal Profile | Instagram profile followers | Indeed company profile | LinkedIn profile | ProductHunt Top Products |
Fiverr search results | Instagram profile posts and reels | Indeed job post | LinkedIn Sales Navigator People list results | Realtor.com search results |
FlexJobs job post | Instagram profile | Indeed job search | LinkedIn Sales Navigator People search results | Reddit homepage, subreddit or search results |
FlexJobs search results | Instagram post comments | Indie Hackers feed and group posts | LinkedIn Sales Navigator Company search results | Reddit post |
G2 product overview | LinkedIn comments | Instagram post details | LinkedIn job post | Redfin property |
G2 product reviews | LinkedIn company about page | IMDB list | LinkedIn staff distribution | Redfin search results |
G2 product search | LinkedIn company jobs tab | IMDB title | Meetup events | Remote OK job post |
GitHub project | LinkedIn employee count insights | Indie Hackers feed and group posts | Monster.com job post | Remote OK search results |
GitHub user profile | LinkedIn group members | Instagram post details | Monster.com search results | SEEK job post |
Glassdoor job post | LinkedIn job search results | Instagram profile followers | Product Hunt products list | SEEK search results |
Glassdoor jobs search results | LinkedIn people search | Instagram profile posts and reels | Product Hunt search results | ThemeForest search results |
Google search news tab | LinkedIn People you may know | Instagram profile | ProductHunt Top Products | Threads post |
Google job post | LinkedIn Post | Instagram post comments | ProductHunt product | Threads posts list |
Google job posts search results | LinkedIn post search results | LinkedIn comments | ProductHunt topic page | TikTok profile |
Google Maps location card reviews | LinkedIn profile | LinkedIn companies search results | Realtor.com agents and realtors search results | TikTok video comments |
Google Maps search results | LinkedIn Sales Navigator People list results | LinkedIn company about page | Realtor.com property | TikTok video details |
Google Maps location card | LinkedIn Sales Navigator People search results | LinkedIn company jobs tab | Realtor.com search results | TikTok videos by hashtag search results |
Google Play app page | LinkedIn Sales Navigator Company search results | LinkedIn employee count insights | Redfin property | Twitter homepage, thread, search and feed |
Google Play app reviews | LinkedIn job post | LinkedIn group members | Redfin search results | Twitter profile |
Google Play search results | LinkedIn staff distribution | LinkedIn job search results | Remote OK job post | Twitter tweet |
Google search card | Meetup events | LinkedIn people search | Remote OK search results | Upwork Candidate |
Google search results | Monster.com job post | LinkedIn People you may know | SEEK job post | Upwork Job |
Google Translate | Monster.com search results | LinkedIn Post | SEEK search results | WhatsApp web contacts |
Google Travel hotels search results | Product Hunt products list | LinkedIn post search results | ThemeForest search results | WordPress search results |
Google Trends trending now search | Product Hunt search results | LinkedIn profile | Threads post | Yelp service details |
Google News | ProductHunt Top Products | LinkedIn Sales Navigator People list results | Threads posts list | Yelp search results |
IMDB list | ProductHunt product | LinkedIn Sales Navigator People search results | TikTok profile | YouTube channel video list |
IMDB title | ProductHunt topic page | LinkedIn Sales Navigator Company search results | TikTok video comments | YouTube Comments |
Indeed company profile | Realtor.com agents and realtors search results | LinkedIn job post | TikTok video details | Youtube History |
Indeed job post | Realtor.com property | LinkedIn staff distribution | TikTok videos by hashtag search results | Youtube Profile about tab |
Indeed job search | Realtor.com search results | Meetup events | Twitter homepage, thread, search and feed | YouTube transcript |
Indie Hackers feed and group posts | Reddit homepage, subreddit or search results | Monster.com job post | Twitter profile | YouTube video details |
Instagram post details | Reddit post | Monster.com search results | Twitter tweet | YouTube videos search results |
Instagram profile followers | Redfin property | Product Hunt products list | Upwork Candidate | YouTube video transcription |
Instagram profile posts and reels | Redfin search results | Product Hunt search results | Upwork Job | Zapier app details |
Instagram profile | Remote OK job post | ProductHunt Top Products | WhatsApp web contacts | Zapier apps list |
Instagram post comments | Remote OK search results | ProductHunt product | WordPress search results | Zillow agent |
LinkedIn comments | SEEK job post | ProductHunt topic page | Yelp service details | Zillow property |
LinkedIn companies search results | SEEK search results | Realtor.com agents and realtors search results | Yelp search results | Zillow search results |
Creating a custom scraper template
If you are scraping a page for which we do not have a pre-built template, you can create a custom scraper template which tells the scraper what information to extract and where to find it on the page. Since each website is different, you need a template for each site you want to scrape. You can create a scraper template in the Playbook builder or the popup window.
You can also create or edit templates from the popup window. Click the scraper icon and select “New Scraper Template.” Next, choose either an Single Page or List or Table scraper. Name your template so it's easy to find later.
One website might need multiple templates, like one for LinkedIn profiles and another for search results. Name them clearly. Click on an element you want to extract and select the data type.
If you need to select an item that our scraper isn’t picking up, check out our Advanced Scraping Tutorial.
Types of Scrapers Templates
There are two types of scraper templates: Single Page and List or Table.
The Single page template grabs one piece of information for each data field, like getting just one "name" from a LinkedIn profile page. This works great if you want to collect data from a basic web page like a blog post, LinkedIn profile, or email.
The List or Table scraper template looks for repeating elements on a page. For example, it can extract multiple names from a LinkedIn search results page, where each name appears once per search result. This is ideal for collecting data from a list of Amazon products, LinkedIn search results, or an email thread.
Creating a list scraper - Premium action
When creating a list scraper template, there's an extra step – defining the list. You need to click on the same item in two different list items. This helps Bardeen know which lists to scrape since some pages have multiple lists.
Bardeen will highlight each item with a box to make sure that is the exact data you want. Click on an item inside any box to add it to your template.
Bardeen will draw boxes around each element. Click on an element inside any of the boxes to add it to your scraper template.
Loading more list items (pagination)
After you finish setting up your list scraper template, a new window will open. It will ask if you want to load more items (pagination). Most websites don’t load long lists all at once. Instead, they use infinite scroll or multiple pages.
You have two options for scraping long lists: infinite scroll and click pagination.
Websites like Facebook or Instagram load new items when you scroll to the bottom. For these, choose “infinite scroll.” Other websites like Google or LinkedIn require you to click a button to go to the next page.
For these, choose “click pagination” and select the button that takes you to the next page (usually it is the > icon).
What if a list has a million items but you only need a few hundred? You can set the maximum number of items or pages to scrape. If left blank, the scraper will try to get as many items as possible.
If you want to stop a scraping job in progress, close the app window and click the "stop scraping" button at the bottom right corner of the screen. You can also do this from the Activities tab → Queue.
If you want to stop scraping jobs in-progress, close the app window and click on the "Stop Scraping" button at the bottom right corner of the screen.
You can also do this from the Activities tab→ Queue.
How to edit a scraper template
There are two common reasons to update your scraper template.
The first is when it isn't working properly and doesn’t extract information correctly. This usually happens when a website changes.
The second reason is if you want to extract more data fields. Editing a scraper template is as easy as creating one.
Click the scraper icon in the popup window and choose the template you want to edit. A new window will open with the original web page you used to create the template. From there, you can add new data fields or delete existing ones.
Building Playbooks with scraper
Building automations with the scraper is similar to building any other Playbook. Go to the Builder, add an action, and choose the scraper template.
The scraper outputs data as a table. You can connect this data to other actions.
For example, to add LinkedIn profile data to your Notion database, click on a box next to a column name and map it to the related field from the scraper action.
When you use the list scraper, it will output a table with multiple rows. Bardeen will run every action once per row. In this example, Notion will create a new entry for each LinkedIn profile returned by the list scraper.
Using multiple scraper templates in one Playbook (deep scraper) - Premium action
You can use multiple scraper templates in one Playbook. This is often done to scrape search results and then visit each page to get more data. This combination is called a “deep scraper.”
To build this type of Playbook, set up the first scraper action as usual. Then, use the links from the first scraper as the input for your second scraper action.
Known Limitations
While we strive to scrape as many websites as possible effectively, there are certain limitations and technical challenges that we may encounter. Despite our best efforts, some websites or elements remain difficult to scrape due to various constraints. Here are some known limitations:
- Iframes.
- Shadow DOM.
- Pages blocking users (CAPTCHAs and similar).
- Pages making scraping difficult (usually solvable with custom models).
- Airtable is often tricky to scrape.
- Inability to scrape a specific element from a webpage that does not have a selector.
- Inline JavaScript.
Explore scraper use cases
In the next tutorial, we will cover advanced scraper techniques, use cases, and troubleshooting common issues.
Have more questions, check out our FAQs here.
Comments
0 comments
Please sign in to leave a comment.