New Python HTML Libraries 2026
last commit 2 years ago html5lib/html5lib-python 1K -1
added 12 months ago
Standards-compliant library for parsing and serializing HTML documents and fragments in Python
last commit 4 months ago alir3z4/html2text 2K +2
added 12 months ago
Convert HTML to Markdown-formatted text.
last commit 1 month ago mozilla/bleach 2K +6
added 12 months ago
Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes
last commit 1 month ago buriy/python-readability 2K +2
added 12 months ago
Given an HTML document, extract and clean up the main body text and title.
last commit 1 month ago lxml/lxml 3K +6
added 12 months ago
lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language
last commit 1 month ago scrapy/parsel 1K +3
added 1 year ago
Parsel lets you extract data from XML/HTML/JSON documents using XPath or CSS selectors.
last commit 2 years ago psf/requests-html 13K -3
added 1 year ago