Alternative Big Data libraries for Python
Updated :
Dec. 9, 2022
BigDL
Github stargazers
4111
Github forks
1
Commits
18828
Code contributors Contributors
170
Fast, distributed, secure AI for Big Data
Created
Aug. 29, 2016
Updated
Dec. 9, 2022
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
672
root
Github stargazers
1990
Github forks
1
Commits
76202
Code contributors Contributors
337
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
Created
June 27, 2013
Updated
Dec. 8, 2022
License
other
Github repo
Primary Language, based on Github DataLanguage
C++
Issues
710
Homepage
tuplex
Github stargazers
792
Github forks
43
Commits
63
Code contributors Contributors
8
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.
Created
June 30, 2021
Updated
Oct. 6, 2022
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
C++
Issues
31
data-structures-algorithms-python
Github stargazers
791
Github forks
1181
Commits
86
Code contributors Contributors
6
This tutorial playlist covers data structures and algorithms in python. Every tutorial has theory behind data structure or an algorithm, BIG O Complexity analysis and exercises that you can practice on.
Created
Sept. 29, 2020
Updated
Nov. 14, 2022
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
46
oio-sds
Github stargazers
572
Github forks
92
Commits
6855
Code contributors Contributors
96
High Performance Software-Defined Object Storage for Big Data and AI, that supports Amazon S3 and Openstack Swift
Created
March 13, 2015
Updated
Oct. 17, 2022
License
other
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
29
Homepage
eland
Github stargazers
458
Github forks
62
Commits
427
Code contributors Contributors
20
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Created
June 11, 2019
Updated
Nov. 2, 2022
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
79
python-bigquery-pandas
Github stargazers
342
Github forks
107
Commits
333
Code contributors Contributors
41
Google BigQuery connector for pandas
Created
Feb. 8, 2017
Updated
Dec. 8, 2022
License
bsd-3-clause
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
50
hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference
Github stargazers
338
Github forks
126
Commits
236
Code contributors Contributors
5
Real Time Big Data / IoT Machine Learning (Model Training and Inference) with HiveMQ (MQTT), TensorFlow IO and Apache Kafka - no additional data store like S3, HDFS or Spark required
Created
May 8, 2019
Updated
Nov. 5, 2020
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Jupyter
Issues
8
arvados
Github stargazers
332
Github forks
105
Commits
23647
Code contributors Contributors
39
An open source platform for managing and analyzing biomedical big data
Created
April 11, 2013
Updated
Dec. 8, 2022
License
other
Github repo
Primary Language, based on Github DataLanguage
Go
Issues
10
Homepage
lithops
Github stargazers
242
Github forks
76
Commits
3462
Code contributors Contributors
34
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
Created
April 23, 2018
Updated
Dec. 5, 2022
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
3
transbigdata
Github stargazers
239
Github forks
80
Commits
624
Code contributors Contributors
8
A Python package develop for transportation spatio-temporal big data processing, analysis and visualization.
Created
Oct. 21, 2021
Updated
Dec. 2, 2022
License
bsd-3-clause
Github repo
Primary Language, based on Github DataLanguage
Python
gimel
Github stargazers
231
Github forks
92
Commits
167
Code contributors Contributors
23
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Created
April 4, 2018
Updated
Nov. 8, 2022
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Scala
Issues
9
Homepage
PythonDataScienceFullThrottle
Github stargazers
209
Github forks
182
Commits
158
Code contributors Contributors
2
Downloads for my Safari Online Learning live training course Python Data Science Full Throttle: Introductory Artificial Intelligence (AI), Big Data and Cloud Case Studies
Created
July 18, 2019
Updated
Dec. 5, 2022
Github repo
Type
Resource
Primary Language, based on Github DataLanguage
Jupyter
Issues
1
bigdata-playground
Github stargazers
200
Github forks
71
Commits
465
Code contributors Contributors
4
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Created
Dec. 12, 2017
Updated
Feb. 1, 2019
License
apache-2.0
Github repo
Type
App
Primary Language, based on Github DataLanguage
TypeScript
Issues
7
kubernetes-bigquery-python
Github stargazers
126
Github forks
88
Commits
41
Code contributors Contributors
7
Example Kubernetes app that shows how to build a 'pipeline' to stream data into BigQuery. Uses Redis or Google Cloud PubSub
Created
Dec. 17, 2014
Updated
Oct. 20, 2020
License
apache-2.0
Github repo
Type
App
Primary Language, based on Github DataLanguage
Python
Issues
5
python-bigdata
Github stargazers
124
Github forks
166
Commits
30
Code contributors Contributors
2
Data science and Big Data with Python
Created
July 14, 2016
Updated
Oct. 6, 2020
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
1
Frank-Kanes-Taming-Big-Data-with-Apache-Spark-and-Python
Github stargazers
105
Github forks
181
Commits
11
Code contributors Contributors
5
Frank Kane's Taming Big Data with Apache Spark and Python, published by Packt
Created
June 30, 2017
Updated
Oct. 24, 2022
License
mit
Github repo
Type
Resource
Primary Language, based on Github DataLanguage
Python
Issues
1
Big-Data-Engineering-Coursera-Yandex
Github stargazers
91
Github forks
73
Commits
32
Code contributors Contributors
1
Big Data for Data Engineers Coursera Specialization from Yandex
Created
March 29, 2018
Updated
Nov. 20, 2018
License
mit
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
4
Homepage
spark-and-python-for-big-data-with-pyspark
Github stargazers
78
Github forks
110
Commits
2
Code contributors Contributors
1
Course on Udemy by Jose Portilla
Created
Jan. 17, 2018
Updated
Jan. 17, 2018
Github repo
Primary Language, based on Github DataLanguage
Jupyter
python-bigquery-datatransfer
Github stargazers
69
Github forks
28
Commits
356
Code contributors Contributors
34
--
Created
Dec. 10, 2019
Updated
Dec. 8, 2022
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
6