MainContentExtractor

View on PyPIReverse Dependencies (0)

0.0.4 MainContentExtractor-0.0.4-py3-none-any.whl

Wheel Details

Project: MainContentExtractor
Version: 0.0.4
Filename: MainContentExtractor-0.0.4-py3-none-any.whl
Download: [link]
Size: 5716
MD5: b4b41514a88112eb1b8be45980d83ed2
SHA256: 77684179436e28eb2e19be26657cb2bbd7c1f9213a2c3ee163a8f9dfbca64107
Uploaded: 2023-12-10 08:05:00 +0000

dist-info

METADATA

Metadata-Version: 2.1
Name: MainContentExtractor
Version: 0.0.4
Summary: A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.
Author: HawkClaws
Home-Page: https://github.com/HawkClaws/main_content_extractor
Project-Url: Source Code, https://github.com/HawkClaws/main_content_extractor
License: MIT
Requires-Python: >=3.6
Requires-Dist: trafilatura (>=1.6.2)
Requires-Dist: html2text (>=2020.1.16)
Requires-Dist: beautifulsoup4 (>=4.12.2)
Description-Content-Type: text/markdown
[Description omitted; length: 1885 characters]

WHEEL

Wheel-Version: 1.0
Generator: bdist_wheel (0.42.0)
Root-Is-Purelib: true
Tag: py3-none-any

RECORD

Path Digest Size
main_content_extractor/__init__.py sha256=gqRGeYpH9y6K5yMozTs89VI0vlDpqWbN2jwfodBtjWo 59
main_content_extractor/main_content_extractor.py sha256=SahzDwk2uzYQCFvO9kMAl703wMytrfl9pnkRtlG7MLc 7093
main_content_extractor/trafilatura_extends.py sha256=cKi1uZSVOr-Tmd4F1xJcS5q6VLHmPc5twtk1vsYAPzI 3681
MainContentExtractor-0.0.4.dist-info/METADATA sha256=De3-MbbJ9aFl_ZI4PFLqgME149HkQ5eHTwSO3m0EY7I 2499
MainContentExtractor-0.0.4.dist-info/WHEEL sha256=oiQVh_5PnQM0E3gPdiz09WCNmwiHDMaGer_elqB3coM 92
MainContentExtractor-0.0.4.dist-info/top_level.txt sha256=02_otKix4LKWyAmCPFL9Ty1qeD3I5SzsC1LmOdoqL0g 23
MainContentExtractor-0.0.4.dist-info/RECORD

top_level.txt

main_content_extractor