Alternative Scraping tools and utilities for Python
Updated :
April 27, 2024
scrapy
Github stargazers
50907
Github forks
10334
Commits
10209
Code contributors Contributors
555
Scrapy, a fast high-level web crawling & scraping framework for Python.
Created
Feb. 22, 2010
Updated
April 19, 2024
License
bsd-3-clause
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
648
Homepage
twint
Read-only repository, archived by owner Archived
Github stargazers
15549
Github forks
2719
Commits
845
Code contributors Contributors
60
An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
Created
June 10, 2017
Updated
March 2, 2021
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
589
pattern
Github stargazers
8666
Github forks
1574
Commits
1434
Code contributors Contributors
19
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
Created
May 3, 2011
Updated
April 25, 2020
License
bsd-3-clause
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
174
Homepage
autoscraper
Github stargazers
5942
Github forks
628
Commits
137
Code contributors Contributors
7
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Created
Aug. 31, 2020
Updated
July 17, 2022
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
11
twitter-scraper
Github stargazers
3822
Github forks
597
Commits
208
Code contributors Contributors
28
Scrape the Twitter Frontend API without authentication.
Created
Feb. 22, 2018
Updated
Dec. 17, 2021
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
57
cloudflare-scrape
Github stargazers
3293
Github forks
450
Commits
156
Code contributors Contributors
23
A Python module to bypass Cloudflare's anti-bot page.
Created
Feb. 28, 2013
Updated
March 23, 2020
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
132
Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE
Github stargazers
3055
Github forks
535
Commits
615
Code contributors Contributors
24
Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!
Created
Sept. 12, 2020
Updated
Feb. 21, 2024
License
gpl-3.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
25
pagodo
Github stargazers
2576
Github forks
473
Commits
136
Code contributors Contributors
7
pagodo (Passive Google Dork) - Automate Google Hacking Database scraping and searching
Created
Aug. 19, 2016
Updated
April 11, 2024
License
gpl-3.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
2
GitDorker
Github stargazers
2179
Github forks
408
Commits
113
Code contributors Contributors
4
A Python program to scrape secrets from GitHub through usage of a large repository of dorks.
Created
July 13, 2020
Updated
May 7, 2021
Github repo
Type
Tool/utility
Primary Language, based on Github DataLanguage
Python
Issues
19
lazynlp
Github stargazers
2150
Github forks
311
Commits
14
Code contributors Contributors
4
Library to scrape and clean web pages to create massive datasets.
Created
Feb. 27, 2019
Updated
Oct. 7, 2019
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
10
JobFunnel
Github stargazers
1749
Github forks
203
Commits
413
Code contributors Contributors
12
Scrape job websites into a single spreadsheet with no duplicates.
Created
Aug. 25, 2017
Updated
Nov. 25, 2021
License
mit
Github repo
Type
Cli
Primary Language, based on Github DataLanguage
Python
Issues
8
ruia
Github stargazers
1732
Github forks
183
Commits
440
Code contributors Contributors
14
Async Python 3.6+ web scraping micro-framework based on asyncio
Created
July 10, 2018
Updated
Aug. 21, 2022
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
8
Homepage
recipe-scrapers
Github stargazers
1509
Github forks
487
Commits
1257
Code contributors Contributors
172
Python package for scraping recipes data
Created
Sept. 14, 2015
Updated
April 25, 2024
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
102
SoundScrape
Github stargazers
1409
Github forks
145
Commits
250
Code contributors Contributors
13
SoundCloud (and Bandcamp and Mixcloud) downloader in Python.
Created
Dec. 29, 2013
Updated
Nov. 22, 2020
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
68
search-script-scrape
Github stargazers
1232
Github forks
227
Commits
56
Code contributors Contributors
1
101 real world web scraping exercises in Python 3 for data journalists
Created
June 7, 2015
Updated
Oct. 5, 2015
Github repo
Type
Script
Primary Language, based on Github DataLanguage
Python
Issues
2
Homepage
requests-ip-rotator
Github stargazers
1229
Github forks
128
Commits
44
Code contributors Contributors
6
A Python library to utilize AWS API Gateway's large IP pool as a proxy to generate pseudo-infinite IPs for web scraping and brute forcing.
Created
July 17, 2021
Updated
Nov. 13, 2023
License
gpl-3.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
3
Homepage
fansly
Read-only repository, archived by owner Archived
Github stargazers
1204
Github forks
62
Commits
187
Code contributors Contributors
3
Easy to use fansly.com content downloading tool. Written in python, but ships as a standalone Executable App for Windows too. Enjoy your Fansly content offline anytime, anywhere in the highest possible content resolution! Fully customizable to download in bulk or single: photos, videos & audio from timeline, messages, collection & specific posts 👍
Created
Oct. 8, 2021
Updated
Sept. 2, 2023
License
gpl-3.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
26
Homepage
scrapy-cluster
Github stargazers
1159
Github forks
324
Commits
747
Code contributors Contributors
25
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
Created
April 14, 2015
Updated
April 7, 2021
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
17
django-dynamic-scraper
Github stargazers
1138
Github forks
308
Commits
552
Code contributors Contributors
9
Creating Scrapy scrapers via the Django admin interface
Created
Dec. 16, 2011
Updated
June 25, 2021
License
bsd-3-clause
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
40
OnionSearch
Github stargazers
1110
Github forks
149
Commits
44
Code contributors Contributors
4
OnionSearch is a script that scrapes urls on different .onion search engines.
Created
March 18, 2020
Updated
Dec. 29, 2023
License
gpl-3.0
Github repo
Type
Script
Primary Language, based on Github DataLanguage
Python
Issues
12