Alternative Big Data libraries for Python
Updated :
April 23, 2024
SparkCourse
Github stargazers
19
Github forks
12
Commits
12
Code contributors Contributors
1
Taming Big Data with Apache Spark and Python - Hands On - Udemy
Created
June 17, 2016
Updated
July 11, 2016
Github repo
Primary Language, based on Github DataLanguage
Python
QuakeLabeler
Github stargazers
19
Github forks
4
Commits
73
Code contributors Contributors
2
QuakeLabeler is a Python package to create and manage your seismic training data, processes, and visualization in a single place — so you can focus on building the next big thing.
Created
May 28, 2021
Updated
Oct. 5, 2022
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
2
pdf2dataset
Github stargazers
17
Github forks
3
Commits
90
Code contributors Contributors
2
Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features
Created
June 30, 2020
Updated
Sept. 13, 2020
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
12
big-data-tools
Github stargazers
17
Github forks
3
Commits
9
Code contributors Contributors
1
Miscellaneous tools in bash, Python, and Perl for munging Big Data.
Created
Nov. 20, 2009
Updated
June 10, 2010
Github repo
Primary Language, based on Github DataLanguage
Shell
PythonDataScienceBootcampNYC
Github stargazers
17
Github forks
11
Commits
12
Code contributors Contributors
1
Python Data Science Bootcamp NYC Affordable Cost-Effective Best Weekend Classes Python SQL 101 Class Bootcamp Big Data Sciene Tutor NYC, New York Developed various passive course for bootcamps in Data Analytics, took classes at USA (New York) and India. Training theme centered around projects, for example, your portfolio or even themes you are doing at wo
Created
July 26, 2018
Updated
Jan. 19, 2020
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
1
BigETL
Github stargazers
16
Github forks
7
Commits
2
Code contributors Contributors
1
This project is a unified ETL platform that support various data processing technologies, including Spark, Hive, Hadoop, Python, Linux Shell script, etc.
Created
Sept. 26, 2015
Updated
Oct. 16, 2015
Github repo
Primary Language, based on Github DataLanguage
CSS
Python-for-Finance-O-Reilly-
Github stargazers
16
Github forks
9
Commits
2
Code contributors Contributors
1
This repository provides all Python codes and Jupyter Notebooks of the book Python for Finance -- Analyze Big Financial Data by Yves Hilpisch.
Created
Nov. 16, 2017
Updated
Nov. 16, 2017
Github repo
Type
Resource
Primary Language, based on Github DataLanguage
HTML
Aspect-Based-Sentiment-Analysis-on-Yelp-Reviews
Github stargazers
16
Github forks
2
Commits
3
Code contributors Contributors
1
Performed Aspect Based Sentiment Analysis using Topic Modeling(LDA) and sentiment analysis and Regression analysis using Python and Spark on Yelp Restaurant Reviews. The objective of the project was to understand how to extract quantifiable information from reviews to understand the impact of the important aspects for different cuisines and their impact on o
Created
Feb. 6, 2018
Updated
Feb. 6, 2018
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Big-Data-Engineering
Github stargazers
16
Github forks
7
Commits
66
Code contributors Contributors
2
--
Created
Oct. 30, 2020
Updated
April 9, 2023
Github repo
Primary Language, based on Github DataLanguage
PLpgSQL
Google_PubSub_BigQuery
Github stargazers
15
Github forks
4
Commits
7
Code contributors Contributors
1
Demonstrating the concept of Google PubSub, a messaging queue service in Google, thru streaming fake financial data thru PubSub and querying within BigQuery
Created
Aug. 29, 2017
Updated
Aug. 29, 2017
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Python
panel-vegafusion
Github stargazers
15
Github forks
0
Commits
42
Code contributors Contributors
1
Build interactive big data apps with Altair and Vega easily using Panel + VegaFusion.
Created
Jan. 28, 2022
Updated
Jan. 31, 2022
License
agpl-3.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
1
olssen
Github stargazers
14
Github forks
8
Commits
74
Code contributors Contributors
1
OnLine Spectral Search ENgine for Proteomics big data using Apache Spark, Python/Flask, and AngularJS
Created
Sept. 14, 2015
Updated
Sept. 14, 2015
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
ApacheConf
Issues
3
kdd-cup-2019
Github stargazers
14
Github forks
6
Commits
225
Code contributors Contributors
4
Repository for KDD-Cup 2019 with Baidu. Big Data Science practical course @ LMU
Created
July 21, 2019
Updated
Aug. 12, 2019
License
mit
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
2
Spark_ver_BigDataHW_Jiuzhang
Github stargazers
13
Github forks
5
Commits
1
Code contributors Contributors
1
Homework for the Big Data course at Jiuzhang, re-written in Python and Spark!
Created
April 24, 2017
Updated
April 24, 2017
Github repo
Primary Language, based on Github DataLanguage
Jupyter
pyhacks
Github stargazers
13
Github forks
4
Commits
66
Code contributors Contributors
2
Python module to ease writing scripts go over big amount of data in order to perform the same actions. A simple preconfigured threads and queue management and more hacking utils
Created
Oct. 23, 2019
Updated
May 13, 2020
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
1
big-data-solutions
Github stargazers
13
Github forks
1
Commits
5
Code contributors Contributors
1
This repository provides Code examples written in Python,Spark-Scala using primarily boto3 SDK API methods and aws cli examples for majority of the AWS Big Data services. There are also nicley written Wiki articles for most of the common issues/challenges faced within BigData world.
Created
March 6, 2022
Updated
March 6, 2022
Github repo
Primary Language, based on Github DataLanguage
Python
BigData-Hands-on
Github stargazers
12
Github forks
8
Commits
196
Code contributors Contributors
1
Apache Spark programming exercises with Python
Created
June 6, 2016
Updated
April 18, 2021
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Data.Intro
Github stargazers
12
Github forks
0
Commits
45
Code contributors Contributors
1
Introductory course in the field of data science of the cyber education center at campus il which touches both the theoretical and the practical aspect of big data analysis in the Python language
Created
Jan. 9, 2023
Updated
Jan. 18, 2023
Github repo
Primary Language, based on Github DataLanguage
Jupyter
weeBabyBigData
Github stargazers
12
Github forks
0
Commits
91
Code contributors Contributors
1
Python and R scripts for visualising and analysing baby sleep patterns.
Created
March 20, 2017
Updated
May 17, 2017
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
2
Douyu-danmu-spark
Github stargazers
12
Github forks
4
Commits
11
Code contributors Contributors
1
Scrape the host's danmu information in Douyu_TV live-show and do the corresponding statistic analysis by SPARK and some Big Data technologies.
Created
Jan. 13, 2018
Updated
May 14, 2018
Github repo
Type
App
Primary Language, based on Github DataLanguage
Python
Issues
1