Alternative Big Data libraries for Python
vaex
Github stargazers
8289
Github forks
589
Commits
3636
Code contributors Contributors
66
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
Created
Sept. 27, 2014
Updated
Sept. 27, 2024
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
533
Homepage
BigDL
Github stargazers
6643
Github forks
1
Commits
21025
Code contributors Contributors
104
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Created
Aug. 29, 2016
Updated
Sept. 29, 2024
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
1274
root
Github stargazers
2686
Github forks
1
Commits
80008
Code contributors Contributors
414
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
Created
June 27, 2013
Updated
Sept. 27, 2024
License
other
Github repo
Primary Language, based on Github DataLanguage
C++
Issues
811
Homepage
data-structures-algorithms-python
Github stargazers
1221
Github forks
1506
Commits
86
Code contributors Contributors
6
This tutorial playlist covers data structures and algorithms in python. Every tutorial has theory behind data structure or an algorithm, BIG O Complexity analysis and exercises that you can practice on.
Created
Sept. 29, 2020
Updated
Nov. 14, 2022
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
61
tuplex
Github stargazers
813
Github forks
45
Commits
72
Code contributors Contributors
7
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.
Created
June 30, 2021
Updated
Dec. 22, 2023
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
C++
Issues
28
oio-sds
Github stargazers
662
Github forks
93
Commits
7256
Code contributors Contributors
98
High Performance Software-Defined Object Storage for Big Data and AI, that supports Amazon S3 and Openstack Swift
Created
March 13, 2015
Updated
Sept. 27, 2024
License
other
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
29
Homepage
eland
Github stargazers
644
Github forks
98
Commits
513
Code contributors Contributors
36
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Created
June 11, 2019
Updated
Sept. 27, 2024
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
99
Herbie
Github stargazers
491
Github forks
74
Commits
993
Code contributors Contributors
20
Download numerical weather prediction datasets (HRRR, RAP, GFS, IFS, etc.) from NOMADS, NODD partners (Amazon, Google, Microsoft), ECMWF open data, and the University of Utah Pando Archive System.
Created
June 26, 2020
Updated
Aug. 30, 2024
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
66
transbigdata
Github stargazers
464
Github forks
114
Commits
670
Code contributors Contributors
8
A Python package develop for transportation spatio-temporal big data processing, analysis and visualization.
Created
Oct. 21, 2021
Updated
Oct. 28, 2023
License
bsd-3-clause
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
13
python-bigquery-pandas
Github stargazers
447
Github forks
121
Commits
383
Code contributors Contributors
50
Google BigQuery connector for pandas
Created
Feb. 8, 2017
Updated
Sept. 23, 2024
License
bsd-3-clause
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
33
hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference
Github stargazers
406
Github forks
144
Commits
236
Code contributors Contributors
5
Real Time Big Data / IoT Machine Learning (Model Training and Inference) with HiveMQ (MQTT), TensorFlow IO and Apache Kafka - no additional data store like S3, HDFS or Spark required
Created
May 8, 2019
Updated
Nov. 5, 2020
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Jupyter
Issues
7
arvados
Github stargazers
397
Github forks
116
Commits
30473
Code contributors Contributors
81
An open source platform for managing and analyzing biomedical big data
Created
April 11, 2013
Updated
Sept. 27, 2024
License
other
Github repo
Primary Language, based on Github DataLanguage
Go
Issues
19
Homepage
lithops
Github stargazers
317
Github forks
105
Commits
3968
Code contributors Contributors
44
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
Created
April 23, 2018
Updated
Sept. 3, 2024
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
8
PythonDataScienceFullThrottle
Github stargazers
258
Github forks
220
Commits
180
Code contributors Contributors
2
Downloads for my Safari Online Learning live training course Python Data Science Full Throttle: Introductory Artificial Intelligence (AI), Big Data and Cloud Case Studies
Created
July 18, 2019
Updated
Aug. 13, 2024
Github repo
Type
Resource
Primary Language, based on Github DataLanguage
Jupyter
Issues
4
gimel
Github stargazers
245
Github forks
83
Commits
167
Code contributors Contributors
23
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Created
April 4, 2018
Updated
Nov. 8, 2022
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Scala
Issues
10
Homepage
bigdata-playground
Github stargazers
208
Github forks
73
Commits
465
Code contributors Contributors
4
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Created
Dec. 12, 2017
Updated
Feb. 1, 2019
License
apache-2.0
Github repo
Type
App
Primary Language, based on Github DataLanguage
TypeScript
Issues
7
.config
Github stargazers
204
Github forks
69
Commits
1
Code contributors Contributors
1
# # Automatically generated file; DO NOT EDIT. # OpenWrt Configuration # CONFIG_MODULES=y CONFIG_HAVE_DOT_CONFIG=y # CONFIG_TARGET_sunxi is not set # CONFIG_TARGET_apm821xx is not set # CONFIG_TARGET_ath25 is not set CONFIG_TARGET_ar71xx=y # CONFIG_TARGET_ath79 is not set # CONFIG_TARGET_bcm27xx is not set # CONFIG_TARGET_bcm53xx is not set # CONFIG_TARGET_b
Created
June 23, 2020
Updated
June 23, 2020
License
mit
Github repo
Primary Language, based on Github DataLanguage
Shell
Issues
34
WallStreetBets_BigDataAnalysis
Github stargazers
173
Github forks
42
Commits
68
Code contributors Contributors
2
Research project aimed to classify the best stock research posts from r/WallStreetBets for you. 😏
Created
March 15, 2021
Updated
May 16, 2021
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
3
ai-flow
Github stargazers
170
Github forks
36
Commits
869
Code contributors Contributors
12
AI Flow is an open source framework that bridges big data and artificial intelligence.
Created
Oct. 14, 2021
Updated
Oct. 9, 2022
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
12
deltacat
Github stargazers
148
Github forks
22
Commits
324
Code contributors Contributors
12
A portable Pythonic Data Catalog API powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.
Created
Aug. 11, 2021
Updated
Sept. 23, 2024
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
41