What is a puppeteer? What is the use of puppeteer?

Shravani Radhakrishna
4 min readMay 20, 2021

--

Hi guys this is my first post on Medium. Today I’m going to share my knowledge about Puppeteer why do we need it. and how we are going to scrap data using puppeteer and what are other advantages of this node library.

What is Data? Why is data so important?

Data is a collection of facts (numbers, words, measurements, observations, etc)

In general, data is simply another word for information. Data is important because, without data, there would be no world. The world is built upon data.

Whichever industry you work in, or whatever your interests, you will almost certainly have come across a story about how “data” is changing the face of our world. It might be part of a study helping to cure a disease, boost a company’s revenue, make a building more efficient or be responsible for those targeted ads you keep seeing.

What is Web Scraping?

Web scraping is the process of extracting any kind of information that you want from any website, no matter how large the data.

Web scraping is also known as web harvesting or web data extraction.

Web scraping automation tools becoming “smarter” and popular, even people with no programming background can easily apply web scraping for aggregating all sorts of data, empowering their business & work with insights from Big Data.

Why do we need to do Web scraping?

Whether you have a new business or a growing one, web scraping helps you 10x your business growth with web data.

As we discussed earlier data is vital for e-commerce companies. You are able to see the data on your competitor’s website. Primarily, it makes data collection much faster by eliminating the manual data-gathering process.

We can get a huge amount of data easily from any website that you want.

There are so many web scraping tools in the market such as Scrapy, MechanicalSoup, Web-Harvest, PySpider, etc.

Let's go through the introduction of the puppeteer

Puppeteer

Puppeteer is a Node library that we can use to control a headless Chrome instance. We are basically using Chrome, but programmatically using JavaScript.

Puppeteer, as the name implies, allows you to manipulate the browser programmatically just like how a puppet would be manipulated by its puppeteer.

Current version 7.1.0

It is developed and maintained by Google Dev Team.

What is headless Chrome?

Headless Chrome is a way to run the Chrome browser in a headless environment without the full browser UI (without a graphical user interface), that is mainly used for automated testing. In simple terms, the program is going to run in the back-end which is not visible to the user.

Why puppeteer?

These are few things that we can achieve with puppeteer.We will go one by one in detail.

Let start Before we directly start the coding we need to do setup

  1. Download latest version of Node.js (https://nodejs.org/en/download/)
  2. Create a project folder — mkdir scraper > cd scraper
  3. Initialise project directory — npm init.
  4. It will initialise your working directory for node project, and it will present a
    sequence of prompt; just press Enter on every prompt, or you can use (npm init -y) it
    will append the default value for you, saved in package.json file in the current
    directory.
  5. use npm command to install Puppeteer > npm i puppeteer
  6. This will download and bundle the latest version of Chromium.
  7. Create a file for ex: app.js

Let start with how to a screenshot of web page

const puppeteer = require('puppeteer');

(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://google.com');
await page.screenshot({ path: 'example.png' });
await browser.close();
})();
  • Import Puppeteer module
  • Here are the function are done in asynchronously so better to wrap the code in async and await.
  • We have launch() method which is used launch the browser.
  • By use newPage() create a page in the browser.
  • In the goto() method specify which url we need to navigate ex:(https://google.com).
  • After navigating to the google page take a screenshot using the screenshot() method and this method need to specify the path where we need save the png and name of the file.
  • Once screenshot is taken need to close the browser by using close().

Using Puppeteer screenshot Example is done Renaming examples will cover in next Blog Stay Tuned…

--

--