Alternative Text utilites and packages for Python
Updated :
July 19, 2024
newspaper
Github stargazers
13915
Github forks
2105
Commits
651
Code contributors Contributors
96
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
Created
Nov. 25, 2013
Updated
Sept. 2, 2020
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
501
Homepage
OCRmyPDF
Github stargazers
13060
Github forks
965
Commits
3737
Code contributors Contributors
94
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Created
Dec. 20, 2013
Updated
July 18, 2024
License
mpl-2.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
105
OCRmyPDF
Github stargazers
13060
Github forks
965
Commits
3737
Code contributors Contributors
94
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Created
Dec. 20, 2013
Updated
July 18, 2024
License
mpl-2.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
105
TextBlob
Github stargazers
9034
Github forks
1
Commits
563
Code contributors Contributors
29
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
Created
June 30, 2013
Updated
July 17, 2024
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
103
pyWhat
Github stargazers
6473
Github forks
346
Commits
643
Code contributors Contributors
35
๐Ÿธ Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! ๐Ÿง™โ€โ™€๏ธ
Created
March 19, 2021
Updated
May 16, 2023
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
25
snownlp
Github stargazers
6369
Github forks
1364
Commits
57
Code contributors Contributors
8
Python library for processing Chinese text
Created
Nov. 26, 2013
Updated
Jan. 19, 2020
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
44
textgenrnn
Github stargazers
4947
Github forks
754
Commits
174
Code contributors Contributors
16
Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.
Created
Aug. 7, 2017
Updated
July 14, 2020
License
other
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
144
snips-nlu
Github stargazers
3878
Github forks
515
Commits
2154
Code contributors Contributors
14
Snips Python library to extract meaning from text
Created
Feb. 8, 2017
Updated
May 3, 2021
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
67
python-ftfy
Github stargazers
3733
Github forks
119
Commits
614
Code contributors Contributors
13
Fixes mojibake and other glitches in Unicode text, after the fact.
Created
Aug. 24, 2012
Updated
March 15, 2024
License
other
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
23
python-ftfy
Github stargazers
3733
Github forks
119
Commits
614
Code contributors Contributors
13
Fixes mojibake and other glitches in Unicode text, after the fact.
Created
Aug. 24, 2012
Updated
March 15, 2024
License
other
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
23
asciimatics
Github stargazers
3580
Github forks
239
Commits
1134
Code contributors Contributors
44
A cross platform package to do curses-like operations, plus higher level APIs and widgets to create text UIs and ASCII art animations
Created
April 15, 2015
Updated
June 22, 2024
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
22
sumy
Github stargazers
3462
Github forks
525
Commits
445
Code contributors Contributors
30
Module for automatic summarization of text documents and HTML pages.
Created
Feb. 20, 2013
Updated
May 16, 2024
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
22
presidio
Github stargazers
3417
Github forks
530
Commits
1148
Code contributors Contributors
107
Context aware, pluggable and customizable data protection and de-identification SDK for text and images
Created
May 4, 2018
Updated
July 18, 2024
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
72
gpt-2-simple
Github stargazers
3403
Github forks
677
Commits
149
Code contributors Contributors
21
Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts
Created
April 13, 2019
Updated
May 22, 2022
License
other
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
179
textdistance
Github stargazers
3333
Github forks
246
Commits
403
Code contributors Contributors
13
๐Ÿ“ Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
Created
May 5, 2017
Updated
July 16, 2024
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
9
TextRank4ZH
Github stargazers
3245
Github forks
844
Commits
30
Code contributors Contributors
1
:deciduous_tree:ไปŽไธญๆ–‡ๆ–‡ๆœฌไธญ่‡ชๅŠจๆๅ–ๅ…ณ้”ฎ่ฏๅ’Œๆ‘˜่ฆ
Created
Dec. 1, 2014
Updated
April 18, 2023
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
12
trafilatura
Github stargazers
3228
Github forks
239
Commits
1441
Code contributors Contributors
44
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Created
April 8, 2019
Updated
July 19, 2024
License
apache-2.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
68
TextAttack
Github stargazers
2832
Github forks
383
Commits
2661
Code contributors Contributors
54
TextAttack ๐Ÿ™ is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
Created
Oct. 15, 2019
Updated
March 31, 2024
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
59
aeneas
Github stargazers
2448
Github forks
218
Commits
280
Code contributors Contributors
6
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Created
May 11, 2015
Updated
May 13, 2020
License
agpl-3.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
67
CudaText
Github stargazers
2412
Github forks
168
Commits
19870
Code contributors Contributors
28
Cross-platform text editor, written in Free Pascal
Created
Sept. 22, 2015
Updated
July 19, 2024
License
mpl-2.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
39