Alternative Text utilites and packages for Python
Updated :
Sept. 21, 2023
newspaper
Github stargazers
13119
Github forks
2072
Commits
651
Code contributors Contributors
96
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Created
Nov. 25, 2013
Updated
Sept. 2, 2020
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
498
Homepage
OCRmyPDF
Github stargazers
9819
Github forks
768
Commits
3467
Code contributors Contributors
79
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Created
Dec. 20, 2013
Updated
Sept. 18, 2023
License
mpl-2.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
102
OCRmyPDF
Github stargazers
9819
Github forks
768
Commits
3467
Code contributors Contributors
79
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Created
Dec. 20, 2013
Updated
Sept. 18, 2023
License
mpl-2.0
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
102
TextBlob
Github stargazers
8687
Github forks
1127
Commits
563
Code contributors Contributors
27
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
Created
June 30, 2013
Updated
March 11, 2023
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
113
snownlp
Github stargazers
6170
Github forks
1379
Commits
57
Code contributors Contributors
8
Python library for processing Chinese text
Created
Nov. 26, 2013
Updated
Jan. 19, 2020
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
42
pyWhat
Github stargazers
6077
Github forks
326
Commits
643
Code contributors Contributors
35
๐Ÿธ Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! ๐Ÿง™โ€โ™€๏ธ
Created
March 19, 2021
Updated
May 16, 2023
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
25
textgenrnn
Github stargazers
4922
Github forks
758
Commits
174
Code contributors Contributors
16
Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.
Created
Aug. 7, 2017
Updated
July 14, 2020
License
other
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
144
snips-nlu
Github stargazers
3829
Github forks
528
Commits
2154
Code contributors Contributors
14
Snips Python library to extract meaning from text
Created
Feb. 8, 2017
Updated
May 3, 2021
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
68
python-ftfy
Github stargazers
3601
Github forks
140
Commits
609
Code contributors Contributors
13
Fixes mojibake and other glitches in Unicode text, after the fact.
Created
Aug. 24, 2012
Updated
Oct. 25, 2022
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
17
asciimatics
Github stargazers
3394
Github forks
243
Commits
1106
Code contributors Contributors
43
A cross platform package to do curses-like operations, plus higher level APIs and widgets to create text UIs and ASCII art animations
Created
April 15, 2015
Updated
Sept. 17, 2023
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
23
gpt-2-simple
Github stargazers
3326
Github forks
678
Commits
149
Code contributors Contributors
21
Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts
Created
April 13, 2019
Updated
May 22, 2022
License
other
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
177
sumy
Github stargazers
3245
Github forks
533
Commits
437
Code contributors Contributors
26
Module for automatic summarization of text documents and HTML pages.
Created
Feb. 20, 2013
Updated
Aug. 11, 2023
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
18
textdistance
Github stargazers
3174
Github forks
246
Commits
376
Code contributors Contributors
10
๐Ÿ“ Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
Created
May 5, 2017
Updated
Sept. 18, 2022
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
9
TextRank4ZH
Github stargazers
3041
Github forks
833
Commits
30
Code contributors Contributors
1
:deciduous_tree:ไปŽไธญๆ–‡ๆ–‡ๆœฌไธญ่‡ชๅŠจๆๅ–ๅ…ณ้”ฎ่ฏๅ’Œๆ‘˜่ฆ
Created
Dec. 1, 2014
Updated
April 18, 2023
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
10
TextAttack
Github stargazers
2489
Github forks
340
Commits
2639
Code contributors Contributors
49
TextAttack ๐Ÿ™ is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
Created
Oct. 15, 2019
Updated
Sept. 11, 2023
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
42
texar
Github stargazers
2364
Github forks
383
Commits
1719
Code contributors Contributors
29
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Created
July 22, 2017
Updated
July 29, 2020
License
apache-2.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
38
Homepage
presidio
Github stargazers
2349
Github forks
389
Commits
1114
Code contributors Contributors
73
Context aware, pluggable and customizable data protection and de-identification SDK for text and images
Created
May 4, 2018
Updated
Sept. 20, 2023
License
mit
Github repo
Primary Language, based on Github DataLanguage
Python
Issues
58
aeneas
Github stargazers
2267
Github forks
219
Commits
280
Code contributors Contributors
6
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Created
May 11, 2015
Updated
May 13, 2020
License
agpl-3.0
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
62
textacy
Github stargazers
2104
Github forks
255
Commits
1816
Code contributors Contributors
30
NLP, before and after spaCy
Created
Feb. 3, 2016
Updated
April 3, 2023
License
other
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
34
pytextrank
Github stargazers
2047
Github forks
364
Commits
468
Code contributors Contributors
17
Python implementation of TextRank algorithms ("textgraphs") for phrase extraction
Created
Oct. 2, 2016
Updated
Aug. 22, 2023
License
mit
Github repo
Type
Module/library
Primary Language, based on Github DataLanguage
Python
Issues
8
Homepage