Node Js Scraping utils | 𝟐𝟎𝟐𝟎 | 𝐍𝐞𝐰𝐛𝐲𝐂𝐨𝐝𝐞𝐫.𝐜𝐨𝐦

Top alternative scraping utilities for Nodejs

cheerio

Github stargazers

28614

Github forks

Commits

2905

Code contributors Contributors

139

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

Created

Oct. 9, 2011

Updated

Sept. 27, 2024

License

mit

Github repo

Type

Module/library

Primary Language, based on Github DataLanguage

TypeScript

Issues

Homepage

cheerio.js.org

jsdom

Github stargazers

20515

Github forks

1701

Commits

3639

Code contributors Contributors

296

A JavaScript implementation of various web standards, for use with Node.js

Created

Jan. 19, 2010

Updated

Sept. 22, 2024

License

mit

Github repo

Type

Module/library

Platform

Node.js, Browser

Primary Language, based on Github DataLanguage

JavaScript

Issues

525

apify-js

Github stargazers

15397

Github forks

661

Commits

4056

Code contributors Contributors

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

Created

Aug. 26, 2016

Updated

Sept. 27, 2024

License

apache-2.0

Github repo

Type

Cli

Platform

Node.js, Browser

Primary Language, based on Github DataLanguage

TypeScript

Issues

131

Homepage

crawlee.dev

scrape-it

Github stargazers

4009

Github forks

220

Commits

202

Code contributors Contributors

🔮 A Node.js scraper for humans.

Created

April 28, 2016

Updated

April 23, 2024

License

mit

Github repo

Type

Module/library

Primary Language, based on Github DataLanguage

JavaScript

Issues

Homepage

ionicabizau.net

google-play-scraper

Github stargazers

2335

Github forks

631

Commits

469

Code contributors Contributors

Node.js scraper to get data from Google Play

Created

April 7, 2015

Updated

Aug. 8, 2024

License

mit

Github repo

Type

Module/library

Primary Language, based on Github DataLanguage

JavaScript

Issues

103

node-website-scraper

Github stargazers

1559

Github forks

276

Commits

493

Code contributors Contributors

Download website to local directory (including all css, images, js, etc.)

Created

Sept. 4, 2014

Updated

Sept. 16, 2024

License

mit

Github repo

Type

Module/library

Platform

Node.js

Primary Language, based on Github DataLanguage

JavaScript

Issues

Homepage

npmjs.org

noodle

Github stargazers

747

Github forks

Commits

531

Code contributors Contributors

A node server and module which allows for cross-domain page scraping on web documents with JSONP or POST.

Created

June 29, 2012

Updated

May 14, 2024

Github repo

Type

Module/library

Platform

Node.js

Primary Language, based on Github DataLanguage

JavaScript

Issues

Homepage

noodle.dharmafly.com

openGraphScraper

Github stargazers

675

Github forks

107

Commits

1001

Code contributors Contributors

Node.js scraper service for Open Graph Info and More!

Created

Sept. 1, 2013

Updated

Sept. 18, 2024

License

mit

Github repo

Type

Module/library

Platform

Browser

Primary Language, based on Github DataLanguage

TypeScript

node-scraper

Github stargazers

519

Github forks

Commits

Code contributors Contributors

Easier web scraping using node.js and jQuery

Created

Dec. 5, 2010

Updated

May 31, 2011

License

mit

Github repo

Type

Module/library

Platform

Node.js

Primary Language, based on Github DataLanguage

JavaScript

Issues

webster

Github stargazers

515

Github forks

Commits

119

Code contributors Contributors

a reliable high-level web crawling & scraping framework for Node.js.

Created

Nov. 4, 2017

Updated

Aug. 27, 2024

License

gpl-3.0

Github repo

Type

App

Primary Language, based on Github DataLanguage

JavaScript

Issues

node-google

Github stargazers

454

Github forks

115

Commits

Code contributors Contributors

A Node.js module to search and scrape Google.

Created

July 10, 2012

Updated

Sept. 20, 2016

License

mit

Github repo

Type

Module/library

Platform

Node.js

Primary Language, based on Github DataLanguage

JavaScript

Issues

micro-open-graph

Read-only repository, archived by owner Archived

Github stargazers

381

Github forks

Commits

Code contributors Contributors

A tiny Node.js microservice to scrape open graph data with joy.

Created

Feb. 21, 2017

Updated

Feb. 11, 2019

License

mit

Github repo

Primary Language, based on Github DataLanguage

JavaScript

Issues

node-readability

Github stargazers

343

Github forks

Commits

261

Code contributors Contributors

Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English.

Created

May 10, 2014

Updated

Aug. 1, 2018

Github repo

Platform

Node.js

Primary Language, based on Github DataLanguage

JavaScript

Issues

yakuza

Github stargazers

298

Github forks

Commits

383

Code contributors Contributors

Highly scalable Node.js scraping framework for mobsters

Created

Sept. 30, 2014

Updated

Sept. 30, 2015

Github repo

Type

Tool/utility

Platform

Node.js

Primary Language, based on Github DataLanguage

JavaScript

Issues

synth-secrets

Github stargazers

290

Github forks

Commits

Code contributors Contributors

Screen-scraped articles on subtractive synthesis (using Node.js)

Created

Dec. 10, 2014

Updated

Dec. 10, 2014

Github repo

Platform

Node.js

Primary Language, based on Github DataLanguage

JavaScript

Issues

Homepage

soundonsound.com

node-web-scraper

Github stargazers

272

Github forks

197

Commits

Code contributors Contributors

Code for the tutorial: Scraping the Web With Node.js by @kukicado

Created

March 13, 2014

Updated

Jan. 10, 2018

Github repo

Platform

Node.js

Primary Language, based on Github DataLanguage

JavaScript

Issues

Homepage

scotch.io

link-preview-generator

Github stargazers

262

Github forks

Commits

Code contributors Contributors

Get preview data (a title, description, image, domain name) from a url. Library uses puppeteer headless browser to scrape the web site.

Created

Nov. 12, 2019

Updated

March 3, 2022

License

mit

Github repo

Primary Language, based on Github DataLanguage

JavaScript

Issues

Humanoid

Github stargazers

210

Github forks

Commits

Code contributors Contributors

Node.js package to bypass CloudFlare's anti-bot JavaScript challenges

Created

Oct. 20, 2018

Updated

May 21, 2023

License

mit

Github repo

Type

Module/library

Platform

Node.js, Browser

Primary Language, based on Github DataLanguage

JavaScript

Issues

nutella-scrape

Github stargazers

209

Github forks

Commits

Code contributors Contributors

:chocolate_bar: learn to scrape the web with Node.js -- it tastes like chocolate

Created

Aug. 14, 2015

Updated

Sept. 11, 2015

Github repo

Platform

Node.js, Browser

Primary Language, based on Github DataLanguage

JavaScript

Issues

node-scrapy

Github stargazers

154

Github forks

Commits

185

Code contributors Contributors

Simple, lightweight and expressive web scraping with Node.js

Created

June 4, 2014

Updated

Aug. 24, 2020

License

mit

Github repo

Platform

Node.js

Primary Language, based on Github DataLanguage

JavaScript

Issues