Alternative Big Data libraries for Python
Updated :
April 19, 2024
Introduction-to-Big-Data-analysis-Machine-Learning-in-Python-with-PySpark
Github stargazers
12
Github forks
19
Commits
22
Code contributors Contributors
1
--
Created
May 25, 2021
Updated
May 26, 2021
Github repo
big-data-ecosystem
Github stargazers
12
Github forks
0
Commits
2
Code contributors Contributors
1
Project developed during the Cognizant Cloud Data Engineer Bootcamp on the Digital Innovation One platform with the objective of extracting and counting words from a book in plain text format, displaying the most frequent word, through a python algorithm.
Created
March 6, 2022
Updated
March 6, 2022
Github repo
Primary Language, based on Github DataLanguage
Python
Big-Data-Systems-Intelligence-Analytics-Labs-Summer-2022
Github stargazers
11
Github forks
1
Commits
31
Code contributors Contributors
2
Labs for Big Data and Intelligent Analytics
Created
May 16, 2022
Updated
July 26, 2022
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Blooddonorprediction
Github stargazers
11
Github forks
2
Commits
12
Code contributors Contributors
1
Thanks to digitization, we often have access to large databases, consisting of various fields of information, ranging from numbers to texts and even boolean values. Such databases lend themselves especially well to machine learning, classification and big data analysis tasks. We are able to train classifiers, using already existing data and use them for pred
Created
Dec. 21, 2017
Updated
Aug. 4, 2018
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
bigtempo
Github stargazers
11
Github forks
3
Commits
214
Code contributors Contributors
3
Temporal data processment package for Python, providing a scalable programming model created for data analysis, exploration and evaluation.
Created
Aug. 11, 2013
Updated
Jan. 20, 2015
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
1
pytest-bigquery-mock
Github stargazers
11
Github forks
6
Commits
11
Code contributors Contributors
2
Pytest plugin for mocking BigQuery data from the python BigQuery client.
Created
July 28, 2021
Updated
Dec. 28, 2022
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
1
AutoDataPilot
Github stargazers
11
Github forks
0
Commits
15
Code contributors Contributors
1
This repository involves a series of operations, including data cleaning, normalization, scaling, feature selection, and dimensionality reduction, to transform the data into a clean and organized format that is suitable for analysis big data using pandas and numpy libraries in python.
Created
Feb. 25, 2023
Updated
March 10, 2023
License
mit
Github repo
Primary Language, based on Github DataLanguage
Jupyter
eco395m
Github stargazers
11
Github forks
22
Commits
33
Code contributors Contributors
1
ECO 395m: Python, Databases, and Big Data
Created
Jan. 10, 2022
Updated
Nov. 16, 2023
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
20
pybda
Github stargazers
11
Github forks
4
Commits
1015
Code contributors Contributors
1
:computer::computer::computer: A commandline tool for analysis of big biological data sets for distributed HPC clusters.
Created
July 13, 2018
Updated
Oct. 23, 2019
License
gpl-3.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
7
microsoft-big-data-scientist-and-ai
Github stargazers
11
Github forks
4
Commits
86
Code contributors Contributors
1
Microsoft Big Data, Data Scientist, and AI
Created
Sept. 28, 2018
Updated
Jan. 28, 2021
Github repo
Type
Cli
massivedatans
Github stargazers
11
Github forks
4
Commits
85
Code contributors Contributors
1
Big Data vs. complex physical models - a scalable nested sampling inference algorithm for many data sets
Created
July 10, 2017
Updated
Oct. 28, 2019
License
bsd-2-clause
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
1
data-science-vm
Github stargazers
10
Github forks
6
Commits
5
Code contributors Contributors
1
A Big Data Analytics VM for doing Data Science. It provides a huge kickstart to those working with the Big Data Analytics side of Data Science. Essentially, this project automates the creation of the Big Data Scientist's toolbox on a virtual machine (VM). In a few minutes one can begin working with a fully configured data science lab instead of performing
Created
June 8, 2014
Updated
May 28, 2015
License
gpl-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Ruby
praxxis
Read-only repository, archived by owner Archived
Github stargazers
10
Github forks
10
Commits
700
Code contributors Contributors
4
A task interface for Jupyter notebooks built on machine learning and big data
Created
July 23, 2019
Updated
April 10, 2020
License
mit
Github repo
Type
Cli
Primary Language, based on Github DataLanguage
Python
Issues
24