Alternative Big Data libraries for Python
Updated :
May 25, 2022
BigDL
Github stargazers
3930
Github forks
998
Commits
17096
Code contributors Contributors
137
Building Large-Scale AI Applications for Distributed Big Data
Created
Aug. 29, 2016
Updated
May 25, 2022
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
481
root
Github stargazers
1806
Github forks
1
Commits
74549
Code contributors Contributors
319
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
Created
June 27, 2013
Updated
May 24, 2022
License
other
Github repo
Primary Language, based on Github DataLanguage
C++
Issues
603
Homepage
tuplex
Github stargazers
758
Github forks
40
Commits
50
Code contributors Contributors
7
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.
Created
June 30, 2021
Updated
May 20, 2022
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
C++
Issues
29
data-structures-algorithms-python
Github stargazers
545
Github forks
907
Commits
85
Code contributors Contributors
6
This tutorial playlist covers data structures and algorithms in python. Every tutorial has theory behind data structure or an algorithm, BIG O Complexity analysis and exercises that you can practice on.
Created
Sept. 29, 2020
Updated
July 8, 2021
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
41
oio-sds
Github stargazers
544
Github forks
93
Commits
6724
Code contributors Contributors
94
High Performance Software-Defined Object Storage for Big Data and AI, that supports Amazon S3 and Openstack Swift
Created
March 13, 2015
Updated
May 16, 2022
License
other
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
29
Homepage
eland
Github stargazers
373
Github forks
52
Commits
408
Code contributors Contributors
19
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Created
June 11, 2019
Updated
May 18, 2022
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
66
arvados
Github stargazers
310
Github forks
100
Commits
23012
Code contributors Contributors
39
An open source platform for managing and analyzing biomedical big data
Created
April 11, 2013
Updated
May 19, 2022
License
other
Github repo
Primary Language, based on Github DataLanguage
Go
Issues
9
Homepage
python-bigquery-pandas
Github stargazers
304
Github forks
102
Commits
305
Code contributors Contributors
35
Google BigQuery connector for pandas
Created
Feb. 8, 2017
Updated
May 19, 2022
License
bsd-3-clause
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
44
hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference
Github stargazers
298
Github forks
114
Commits
236
Code contributors Contributors
5
Real Time Big Data / IoT Machine Learning (Model Training and Inference) with HiveMQ (MQTT), TensorFlow IO and Apache Kafka - no additional data store like S3, HDFS or Spark required
Created
May 8, 2019
Updated
Nov. 5, 2020
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Jupyter
Issues
8
gimel
Github stargazers
225
Github forks
90
Commits
164
Code contributors Contributors
23
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Created
April 4, 2018
Updated
Feb. 8, 2022
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Scala
Issues
8
Homepage
lithops
Github stargazers
215
Github forks
72
Commits
3261
Code contributors Contributors
32
An open source framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud.
Created
April 23, 2018
Updated
May 24, 2022
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
6
bigdata-playground
Github stargazers
194
Github forks
72
Commits
465
Code contributors Contributors
4
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Created
Dec. 12, 2017
Updated
Feb. 1, 2019
License
apache-2.0
Github repo
Type
App
Primary Language, based on Github DataLanguage
TypeScript
Issues
7
PythonDataScienceFullThrottle
Github stargazers
190
Github forks
168
Commits
142
Code contributors Contributors
2
Downloads for my Safari Online Learning live training course Python Data Science Full Throttle: Introductory Artificial Intelligence (AI), Big Data and Cloud Case Studies
Created
July 18, 2019
Updated
May 23, 2022
Github repo
Type
Resource
Primary Language, based on Github DataLanguage
Jupyter
transbigdata
Github stargazers
156
Github forks
45
Commits
516
Code contributors Contributors
8
A Python package develop for transportation spatio-temporal big data processing, analysis and visualization.
Created
Oct. 21, 2021
Updated
May 22, 2022
License
bsd-3-clause
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
1
python-bigdata
Github stargazers
123
Github forks
167
Commits
30
Code contributors Contributors
2
Data science and Big Data with Python
Created
July 14, 2016
Updated
Oct. 6, 2020
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
1
kubernetes-bigquery-python
Github stargazers
120
Github forks
85
Commits
41
Code contributors Contributors
7
Example Kubernetes app that shows how to build a 'pipeline' to stream data into BigQuery. Uses Redis or Google Cloud PubSub
Created
Dec. 17, 2014
Updated
Oct. 20, 2020
License
apache-2.0
Github repo
Type
App
Primary Language, based on Github DataLanguage
Python
Issues
5
Herbie
Github stargazers
118
Github forks
23
Commits
595
Code contributors Contributors
6
Python for downloading model data (HRRR, RAP, GFS, NBM, etc.) from NOMADS, NOAA's Big Data Program partners (Amazon, Google, Microsoft), ECMWF open data, and the University of Utah Pando Archive System.
Created
June 26, 2020
Updated
May 21, 2022
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
10
ai-flow
Github stargazers
115
Github forks
30
Commits
804
Code contributors Contributors
11
AI Flow is an open source framework that bridges big data and artificial intelligence.
Created
Oct. 14, 2021
Updated
May 23, 2022
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
22
WallStreetBets_BigDataAnalysis
Github stargazers
113
Github forks
34
Commits
68
Code contributors Contributors
2
Research project aimed to classify the best stock research posts from r/WallStreetBets for you. 😏
Created
March 15, 2021
Updated
May 16, 2021
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
1
Frank-Kanes-Taming-Big-Data-with-Apache-Spark-and-Python
Github stargazers
100
Github forks
177
Commits
8
Code contributors Contributors
4
Frank Kane's Taming Big Data with Apache Spark and Python, published by Packt
Created
June 30, 2017
Updated
Jan. 14, 2021
License
mit
Github repo
Type
Resource
Primary Language, based on Github DataLanguage
Python
Issues
1