Alternative Big Data libraries for Python
Updated :
May 31, 2023
BigDL
Github stargazers
4218
Github forks
1
Commits
19878
Code contributors Contributors
174
Fast, distributed, secure AI for Big Data
Created
Aug. 29, 2016
Updated
May 31, 2023
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
739
root
Github stargazers
2126
Github forks
1
Commits
78068
Code contributors Contributors
352
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
Created
June 27, 2013
Updated
May 31, 2023
License
other
Github repo
Primary Language, based on Github DataLanguage
C++
Issues
838
Homepage
data-structures-algorithms-python
Github stargazers
918
Github forks
1296
Commits
86
Code contributors Contributors
6
This tutorial playlist covers data structures and algorithms in python. Every tutorial has theory behind data structure or an algorithm, BIG O Complexity analysis and exercises that you can practice on.
Created
Sept. 29, 2020
Updated
Nov. 14, 2022
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
52
tuplex
Github stargazers
803
Github forks
44
Commits
63
Code contributors Contributors
8
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.
Created
June 30, 2021
Updated
Oct. 6, 2022
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
C++
Issues
32
oio-sds
Github stargazers
597
Github forks
94
Commits
7082
Code contributors Contributors
97
High Performance Software-Defined Object Storage for Big Data and AI, that supports Amazon S3 and Openstack Swift
Created
March 13, 2015
Updated
May 22, 2023
License
other
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
29
Homepage
eland
Github stargazers
522
Github forks
73
Commits
446
Code contributors Contributors
22
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Created
June 11, 2019
Updated
May 25, 2023
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
80
python-bigquery-pandas
Github stargazers
368
Github forks
119
Commits
349
Code contributors Contributors
43
Google BigQuery connector for pandas
Created
Feb. 8, 2017
Updated
May 25, 2023
License
bsd-3-clause
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
48
hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference
Github stargazers
356
Github forks
134
Commits
236
Code contributors Contributors
5
Real Time Big Data / IoT Machine Learning (Model Training and Inference) with HiveMQ (MQTT), TensorFlow IO and Apache Kafka - no additional data store like S3, HDFS or Spark required
Created
May 8, 2019
Updated
Nov. 5, 2020
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Jupyter
Issues
8
arvados
Github stargazers
347
Github forks
108
Commits
24269
Code contributors Contributors
41
An open source platform for managing and analyzing biomedical big data
Created
April 11, 2013
Updated
May 29, 2023
License
other
Github repo
Primary Language, based on Github DataLanguage
Go
Issues
12
Homepage
transbigdata
Github stargazers
305
Github forks
92
Commits
650
Code contributors Contributors
8
A Python package develop for transportation spatio-temporal big data processing, analysis and visualization.
Created
Oct. 21, 2021
Updated
May 29, 2023
License
bsd-3-clause
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
1
gimel
Github stargazers
234
Github forks
94
Commits
167
Code contributors Contributors
23
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Created
April 4, 2018
Updated
Nov. 8, 2022
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Scala
Issues
9
Homepage
PythonDataScienceFullThrottle
Github stargazers
225
Github forks
194
Commits
172
Code contributors Contributors
2
Downloads for my Safari Online Learning live training course Python Data Science Full Throttle: Introductory Artificial Intelligence (AI), Big Data and Cloud Case Studies
Created
July 18, 2019
Updated
May 24, 2023
Github repo
Type
Resource
Primary Language, based on Github DataLanguage
Jupyter
Issues
3
Big-Data-Engineering-Coursera-Yandex
Github stargazers
95
Github forks
74
Commits
33
Code contributors Contributors
1
Big Data for Data Engineers Coursera Specialization from Yandex
Created
March 29, 2018
Updated
March 15, 2023
License
mit
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
4
Homepage
.config
Github stargazers
87
Github forks
66
Commits
1
Code contributors Contributors
1
# # Automatically generated file; DO NOT EDIT. # OpenWrt Configuration # CONFIG_MODULES=y CONFIG_HAVE_DOT_CONFIG=y # CONFIG_TARGET_sunxi is not set # CONFIG_TARGET_apm821xx is not set # CONFIG_TARGET_ath25 is not set CONFIG_TARGET_ar71xx=y # CONFIG_TARGET_ath79 is not set # CONFIG_TARGET_bcm27xx is not set # CONFIG_TARGET_bcm53xx is not set # CONFIG_TARGET_b
Created
June 23, 2020
Updated
June 23, 2020
License
mit
Github repo
Primary Language, based on Github DataLanguage
Shell
Issues
30
spark-and-python-for-big-data-with-pyspark
Github stargazers
83
Github forks
116
Commits
2
Code contributors Contributors
1
Course on Udemy by Jose Portilla
Created
Jan. 17, 2018
Updated
Jan. 17, 2018
Github repo
Primary Language, based on Github DataLanguage
Jupyter
python-bigquery-datatransfer
Github stargazers
80
Github forks
28
Commits
398
Code contributors Contributors
35
--
Created
Dec. 10, 2019
Updated
May 25, 2023
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
4
A-Deep-Learning-Based-Illegal-Insider-Trading-Detection-and-Prediction-Technique-in-Stock-Market
Github stargazers
70
Github forks
19
Commits
4
Code contributors Contributors
1
Illegal insider trading of stocks is based on releasing non-public information (e.g., new product launch, quarterly financial report, acquisition or merger plan) before the information is made public. Detecting illegal insider trading is difficult due to the complex, nonlinear, and non-stationary nature of the stock market. In this work, we present an approa
Created
Nov. 27, 2017
Updated
Jan. 8, 2019
Github repo
Primary Language, based on Github DataLanguage
Python
Homepage
Coursera-Bioinformatics
Github stargazers
62
Github forks
45
Commits
6
Code contributors Contributors
1
My solution to Bioinformatics Specialization (Finding Hidden Messages in DNA; Genome Sequencing; Comparing Genes, Proteins, and Genomes; Molecular Evolution; Genomic Data Science and Clustering; Finding Mutations in DNA and Proteins; Bioinformatics Capstone: Big Data in Biology)
Created
April 5, 2018
Updated
Nov. 1, 2018
License
gpl-3.0
Github repo
Primary Language, based on Github DataLanguage
Python
xcast
Github stargazers
59
Github forks
5
Commits
191
Code contributors Contributors
2
A High-Performance Data Science Toolkit for the Earth Sciences
Created
July 15, 2021
Updated
April 17, 2023
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
2
deltacat
Github stargazers
58
Github forks
10
Commits
217
Code contributors Contributors
7
A Pythonic Data Catalog powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.
Created
Aug. 11, 2021
Updated
April 28, 2023
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
25