Alternative Big Data libraries for Python
Updated :
April 23, 2024
python-bigdata
Github stargazers
129
Github forks
212
Commits
31
Code contributors Contributors
2
Data science and Big Data with Python
Created
July 14, 2016
Updated
Aug. 27, 2023
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
1
Python-big-data
Github stargazers
126
Github forks
67
Commits
2826
Code contributors Contributors
110
Python and Pandas are known to have issues around scalability and efficiency. You will learn how to use libraries such as Modin, Dask, Ray, Vaex etc to overcome the problems faced by Pandas.
Created
Dec. 28, 2022
Updated
Feb. 20, 2024
Github repo
Primary Language, based on Github DataLanguage
Jupyter
bigflow
Github stargazers
115
Github forks
23
Commits
981
Code contributors Contributors
19
A Python framework for data processing on GCP.
Created
July 25, 2019
Updated
March 6, 2024
License
other
Github repo
Type
Cli
Primary Language, based on Github DataLanguage
Python
Issues
48
Frank-Kanes-Taming-Big-Data-with-Apache-Spark-and-Python
Github stargazers
115
Github forks
195
Commits
14
Code contributors Contributors
5
Frank Kane's Taming Big Data with Apache Spark and Python, published by Packt
Created
June 30, 2017
Updated
Jan. 30, 2023
License
mit
Github repo
Type
Resource
Primary Language, based on Github DataLanguage
Python
Issues
1
Big-Data-Engineering-Coursera-Yandex
Github stargazers
99
Github forks
75
Commits
33
Code contributors Contributors
1
Big Data for Data Engineers Coursera Specialization from Yandex
Created
March 29, 2018
Updated
March 15, 2023
License
mit
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
4
Homepage
spark-and-python-for-big-data-with-pyspark
Github stargazers
97
Github forks
118
Commits
2
Code contributors Contributors
1
Course on Udemy by Jose Portilla
Created
Jan. 17, 2018
Updated
Jan. 17, 2018
Github repo
Primary Language, based on Github DataLanguage
Jupyter
deltacat
Github stargazers
94
Github forks
17
Commits
324
Code contributors Contributors
10
A portable Pythonic Data Catalog API powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.
Created
Aug. 11, 2021
Updated
April 9, 2024
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
39
python-bigquery-datatransfer
Read-only repository, archived by owner Archived
Github stargazers
85
Github forks
30
Commits
427
Code contributors Contributors
36
This library has moved to https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-bigquery-datatransfer
Created
Dec. 10, 2019
Updated
Sept. 29, 2023
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
A-Deep-Learning-Based-Illegal-Insider-Trading-Detection-and-Prediction-Technique-in-Stock-Market
Github stargazers
78
Github forks
20
Commits
4
Code contributors Contributors
1
Illegal insider trading of stocks is based on releasing non-public information (e.g., new product launch, quarterly financial report, acquisition or merger plan) before the information is made public. Detecting illegal insider trading is difficult due to the complex, nonlinear, and non-stationary nature of the stock market. In this work, we present an approa
Created
Nov. 27, 2017
Updated
Jan. 8, 2019
Github repo
Primary Language, based on Github DataLanguage
Python
Homepage
Coursera-Bioinformatics
Github stargazers
76
Github forks
45
Commits
6
Code contributors Contributors
1
My solution to Bioinformatics Specialization (Finding Hidden Messages in DNA; Genome Sequencing; Comparing Genes, Proteins, and Genomes; Molecular Evolution; Genomic Data Science and Clustering; Finding Mutations in DNA and Proteins; Bioinformatics Capstone: Big Data in Biology)
Created
April 5, 2018
Updated
Nov. 1, 2018
License
gpl-3.0
Github repo
Primary Language, based on Github DataLanguage
Python
pypar
Github stargazers
69
Github forks
15
Commits
198
Code contributors Contributors
4
Efficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.
Created
May 21, 2013
Updated
Nov. 11, 2016
License
gpl-3.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
5
BigDataPython
Github stargazers
69
Github forks
127
Commits
6
Code contributors Contributors
3
Material de apoyo del libro BIG DATA CON PYTHON. Recolecciรณn, almacenamiento y procesamiento de datos, de Enrique Martรญn Martรญn, Adriรกn Riesco y Rafael Caballero, editado por RC libros
Created
July 3, 2018
Updated
March 2, 2024
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Homepage
xcast
Github stargazers
64
Github forks
6
Commits
196
Code contributors Contributors
3
A High-Performance Data Science Toolkit for the Earth Sciences
Created
July 15, 2021
Updated
Nov. 22, 2023
License
mit
Github repo
Primary Language, based on Github DataLanguage
Jupyter
Issues
3
Spark-and-Kafka_IoT-Data-Processing-and-Analytics
Github stargazers
60
Github forks
29
Commits
3
Code contributors Contributors
1
Final Project for IoT: Big Data Processing and Analytics class. Analyzing U.S nationwide temperature from IoT sensors in real-time
Created
Nov. 21, 2016
Updated
Nov. 21, 2016
Github repo
Primary Language, based on Github DataLanguage
Python
Data-Visualizations
Github stargazers
60
Github forks
29
Commits
10
Code contributors Contributors
1
Data Visualizations is emerging as one of the most essential skills in almost all of the IT and Non IT Background Sectors and Jobs. Using Data Visualizations to make wiser decisions which could land the Business to make bigger profits and understand the root cause and behavioral analysis of people and customers associated to it. In this Repository I have dee
Created
April 9, 2020
Updated
April 9, 2020
License
gpl-3.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Jupyter
Issues
1
Location-based-Restaurants-Recommendation-System
Github stargazers
56
Github forks
24
Commits
9
Code contributors Contributors
1
Big Data Management and Analysis Final Project
Created
July 21, 2017
Updated
March 21, 2018
Github repo
Primary Language, based on Github DataLanguage
Python
Python-Basic-programs
Github stargazers
52
Github forks
16
Commits
217
Code contributors Contributors
1
What is Python? Executive Summary Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing compon
Created
Feb. 11, 2021
Updated
March 2, 2021
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
2
pykylin
Github stargazers
51
Github forks
76
Commits
2
Code contributors Contributors
1
Python DBAPI Driver and Sqlalchemy Dialect for Apache Kylin, the "Extreme OLAP Engine for Big Data"
Created
Nov. 16, 2015
Updated
Nov. 16, 2015
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
11
big-data
Github stargazers
48
Github forks
30
Commits
116
Code contributors Contributors
1
Python tools for big data
Created
Sept. 9, 2019
Updated
Oct. 9, 2023
Github repo
Primary Language, based on Github DataLanguage
Jupyter
bdbag
Github stargazers
48
Github forks
23
Commits
373
Code contributors Contributors
13
Big Data Bag Utilities
Created
March 28, 2016
Updated
March 15, 2024
License
apache-2.0
Github repo
Type
Tool/utility
Primary Language, based on Github DataLanguage
Python
Issues
6