Top alternative Scraping tools and utilities for Java
Updated :
April 19, 2024
jsoup
Github stargazers
10614
Github forks
2139
Commits
1977
Code contributors Contributors
107
jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.
Created
Dec. 19, 2009
Updated
Jan. 10, 2024
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Java
Issues
114
Homepage
newcrawler
Github stargazers
589
Github forks
115
Commits
577
Code contributors Contributors
1
Free Web Scraping Tool with Java
Created
Dec. 3, 2015
Updated
Nov. 25, 2023
Github repo
Type
Tool/utility
Primary Language, based on Github DataLanguage
JavaScript
Issues
25
instagram-java-scraper
Github stargazers
443
Github forks
189
Commits
213
Code contributors Contributors
18
Instagram Java Scraper. Get account information, photos, videos and comments.
Created
May 13, 2016
Updated
Oct. 31, 2022
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Java
Issues
45
ScreenSlicer
Github stargazers
158
Github forks
13
Commits
284
Code contributors Contributors
1
Automatic, zero-config web scraping -- written in Java, has no dependency on Java EE or app servers, and the web scraper has a restful/JSON API. Currently unmaintained.
Created
Sept. 30, 2014
Updated
June 25, 2017
License
other
Github repo
Type
App
Primary Language, based on Github DataLanguage
Java
Issues
1
JLyrics
Github stargazers
60
Github forks
30
Commits
29
Code contributors Contributors
3
🎼 Expandable lyrics-scraping API for Java
Created
Dec. 5, 2018
Updated
Sept. 22, 2022
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Java
Issues
10
scraping-microservice-java-python-rabbitmq
Github stargazers
54
Github forks
31
Commits
10
Code contributors Contributors
1
A sample web scraping service demonstrating how to build a message driven application using RabbitMQ
Created
Aug. 12, 2015
Updated
Feb. 7, 2017
License
apache-2.0
Github repo
Type
App
Primary Language, based on Github DataLanguage
Java
Issues
1
zeekEye
Github stargazers
37
Github forks
17
Commits
182
Code contributors Contributors
1
:octocat:A Fast and Powerful Scraping and Web Crawling Framework.
Created
May 5, 2015
Updated
Feb. 25, 2019
Github repo
Primary Language, based on Github DataLanguage
Java
Homepage
spring-boot-web-scraper
Github stargazers
34
Github forks
23
Commits
18
Code contributors Contributors
1
Simple web scrapping app made using Spring Boot + Thymeleaf + Jsoup + Java 8 Lambdas & Streams
Created
Oct. 29, 2017
Updated
March 26, 2018
License
apache-2.0
Github repo
Type
App
Primary Language, based on Github DataLanguage
Java
prometheus-scraper
Github stargazers
31
Github forks
19
Commits
4
Code contributors Contributors
1
A Java API that can be used to scrape Prometheus endpoints.
Created
Jan. 12, 2018
Updated
Nov. 10, 2022
License
mit
Github repo
Primary Language, based on Github DataLanguage
Java
Issues
4
webGrude
Github stargazers
28
Github forks
3
Commits
142
Code contributors Contributors
5
A java annotation library for Web scraping.
Created
Dec. 30, 2013
Updated
Jan. 31, 2018
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Java
Issues
6
TwitterScraper4J
Read-only repository, archived by owner Archived
Github stargazers
22
Github forks
1
Commits
38
Code contributors Contributors
1
a java library which scrapes twitter to fetch publicly available info
Created
May 31, 2019
Updated
March 18, 2021
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Java
scoopi-scraper
Github stargazers
21
Github forks
8
Commits
153
Code contributors Contributors
2
Scoopi Web Scraper is a heavy duty tool to extract data from HTML pages.
Created
July 25, 2018
Updated
Sept. 27, 2022
License
gpl-3.0
Github repo
Primary Language, based on Github DataLanguage
Java
Homepage
jarvest
Github stargazers
14
Github forks
4
Commits
98
Code contributors Contributors
4
Java web harvesting (scraping) library
Created
July 3, 2012
Updated
Dec. 16, 2015
License
lgpl-3.0
Github repo
Primary Language, based on Github DataLanguage
Java
web-scraping
Github stargazers
14
Github forks
7
Commits
11
Code contributors Contributors
2
Code samples of web scraping using Java.
Created
April 28, 2020
Updated
Sept. 29, 2022
License
mit
Github repo
Primary Language, based on Github DataLanguage
Java
WebScraping
Github stargazers
14
Github forks
7
Commits
11
Code contributors Contributors
2
Code samples of web scraping using Java.
Created
April 28, 2020
Updated
Sept. 29, 2022
License
mit
Github repo
Type
App
Primary Language, based on Github DataLanguage
Java
github-dependents-scraper
Github stargazers
13
Github forks
2
Commits
9
Code contributors Contributors
1
GitHub dependents web scraper using Picocli and Quarkus
Created
July 18, 2020
Updated
July 31, 2020
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Java
bobik_java_sdk
Github stargazers
12
Github forks
3
Commits
18
Code contributors Contributors
1
Web scraping in Java using remote bots
Created
June 18, 2012
Updated
July 5, 2012
Github repo
Type
App
Primary Language, based on Github DataLanguage
Java
Issues
1
imdb-scraper
Read-only repository, archived by owner Archived
Github stargazers
11
Github forks
1
Commits
29
Code contributors Contributors
1
A java class to get IMDb information of movies, series, etc
Created
Feb. 15, 2014
Updated
March 22, 2017
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Java
Stock_Data_Scraper
Github stargazers
10
Github forks
1
Commits
25
Code contributors Contributors
2
Fast and multi threaded stock data scraper written in Java using HTMLUnit and minimal-json. Scrapes Finviz and Stocktwits for data, and stores the information in a csv file.
Created
Feb. 13, 2021
Updated
Aug. 3, 2021
Github repo
Primary Language, based on Github DataLanguage
Java
GathererScraper
Github stargazers
10
Github forks
4
Commits
414
Code contributors Contributors
2
A Java GUI application to scrape Magic cards from gatherer.wizards.com
Created
Oct. 13, 2014
Updated
March 17, 2021
Github repo
Type
App
Primary Language, based on Github DataLanguage
HTML