Deno Web Scrapper
You might have created a web scraper with Node.js + request+ cheerio setup or maybe a python one using beautiful soup. This tutorial brings the same to the world of Deno.
In this example, we are scrapping the list of books from http://books.toscrape.com/
Let's get started, without further ado.
Step 01: app.ts
to start we will create app.ts
file and cover the whole code in a try-catch block to take advantage of the first-class await (global async-await).
check if code logs the url
by running the following command in terminal
deno run app.ts
Step 02: Fetch url
Deno supports lots of native javascript APIs, Fetch API being one of them which makes request handling easy and dependency-free. Response from fetch is saved in a variable named html
.
Deno is secure by default that means to let it access the internet we need to run it with a flag --allow-net
check if code logs the html
by running the following command in terminal
deno run --allow-net app.ts
Step 03: Deno Dom
Deno dom makes it easy to traverse HTML using javascript DOM manipulation methods.
HTML (in text format) that we get with fetch is parsed into a DOMParser object and stored in variable dom. dom variable is traversed to extract page heading from the target site.
check if code logs “Books to Scrape We love being scraped!” by running the following command in terminal
deno run --allow-net app.ts
Bringing it all together
The script picks up the book info by looping over each .product_pod
container on the first page and puts it in the books array.
deno run --allow-net app.ts
will output an array of books with title, price and availability
Github Repo:
Further Readings: