trafilatura

View on PyPIReverse Dependencies (64)

1.9.0 trafilatura-1.9.0-py3-none-any.whl

Wheel Details

Project: trafilatura
Version: 1.9.0
Filename: trafilatura-1.9.0-py3-none-any.whl
Download: [link]
Size: 1025921
MD5: 3fe9e8c4ea427662761ef8c1d187e0a6
SHA256: 8ea09de226c4eae25685f65b2cf56de47260d5c858ed3a8a0bc9c3ddd0841344
Uploaded: 2024-05-02 10:16:46 +0000

dist-info

METADATA

Metadata-Version: 2.1
Name: trafilatura
Version: 1.9.0
Summary: Python package and command-line tool designed to gather text on the Web, includes all necessary discovery and text processing components to perform web crawling, downloads, scraping, and extraction of main texts, metadata and comments.
Author: Adrien Barbaresi
Author-Email: barbaresi[at]bbaw.de
Home-Page: https://trafilatura.readthedocs.io
Project-Url: Documentation, https://trafilatura.readthedocs.io
Project-Url: Source, https://github.com/adbar/trafilatura
Project-Url: Blog, https://adrien.barbaresi.eu/blog/tag/trafilatura.html
License: Apache-2.0
Keywords: corpus,html2text,news-crawler,natural-language-processing,scraper,tei-xml,text-extraction,webscraping,web-scraping
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft
Classifier: Operating System :: POSIX
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Security
Classifier: Topic :: Text Editors :: Text Processing
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Text Processing :: Markup :: HTML
Classifier: Topic :: Text Processing :: Markup :: Markdown
Classifier: Topic :: Text Processing :: Markup :: XML
Classifier: Topic :: Utilities
Requires-Python: >=3.6
Requires-Dist: certifi
Requires-Dist: courlan (>=1.1.0)
Requires-Dist: htmldate (>=1.8.1)
Requires-Dist: justext (>=3.0.0)
Requires-Dist: lxml (<5.2.0,>=4.9.4); platform_system != "Darwin" or python_version > "3.8"
Requires-Dist: lxml (==4.9.2); platform_system == "Darwin" and python_version <= "3.8"
Requires-Dist: charset-normalizer (>=3.0.1); python_version < "3.7"
Requires-Dist: urllib3 (<2,>=1.26); python_version < "3.7"
Requires-Dist: importlib-metadata; python_version < "3.8"
Requires-Dist: charset-normalizer (>=3.2.0); python_version >= "3.7"
Requires-Dist: urllib3 (<3,>=1.26); python_version >= "3.7"
Requires-Dist: brotli; extra == "all"
Requires-Dist: htmldate[speed] (>=1.8.1); extra == "all"
Requires-Dist: py3langid (>=0.2.2); extra == "all"
Requires-Dist: pycurl (>=7.45.3); extra == "all"
Requires-Dist: cchardet (>=2.1.7); python_version < "3.11" and extra == "all"
Requires-Dist: faust-cchardet (>=2.1.19); python_version >= "3.11" and extra == "all"
Requires-Dist: Gooey (>=1.0.1); extra == "gui"
Provides-Extra: all
Provides-Extra: gui
Description-Content-Type: text/markdown
License-File: LICENSE
[Description omitted; length: 10883 characters]

WHEEL

Wheel-Version: 1.0
Generator: bdist_wheel (0.43.0)
Root-Is-Purelib: true
Tag: py3-none-any

RECORD

Path Digest Size
trafilatura/__init__.py sha256=vv_55Wwuxp0IxlUWQCwaA63hqBtZjB___xlQu1ZvAdU 629
trafilatura/baseline.py sha256=ls78ijQPD4oZZo5Iwd6syKCLhElG_9vpyOXOVCKnZKo 3114
trafilatura/cli.py sha256=iyOoqhxFAqFC6k67SLRTtn5oc9Oh5nfTZ4yqXqFZMsQ 13629
trafilatura/cli_utils.py sha256=8WemZL7GCPvJtPkdnUv9GgAfdGmhD9aE__ynN9lB3gY 16019
trafilatura/core.py sha256=lzrNdc5aU5FCCbf3noHvj76QSej9HkZnZ9AaGwsHIik 16261
trafilatura/downloads.py sha256=HP9u8ROv_cTMEvRbOtymYYskHTRxaaAGRuKzkJWLFhY 14735
trafilatura/external.py sha256=OCdKfYO8Tm9aQXHZBYLv1NuPDt4pRNeewaLGyLb1fNE 7913
trafilatura/feeds.py sha256=0ul8RCTf-bekPq4rWCzGn5b9CZzyOUTVz3wD6cxVboM 9649
trafilatura/filters.py sha256=ofhuiHSSj_I-SdlRZwfEPwz3OcRbZ5cQ9lvjm6xqwks 4672
trafilatura/gui.py sha256=CClJtffvEmV7lFKZaNM8BBdUZuP6pHOa4MTNdzYPMSo 1755
trafilatura/hashing.py sha256=ATSJ5uyEfq97GdRy8OApNf4uwTOsgbMvMitkjAirqdo 4876
trafilatura/htmlprocessing.py sha256=ILjnZ5gDBE5Ron0RDxvBD84JIxKVtg91fsaNF5KbfYs 13097
trafilatura/json_metadata.py sha256=wbgVVs6eeOjWBcwWttpC2cJi3NbSMUg3lDQLiyBZc2Y 9844
trafilatura/lru.py sha256=NmyVWOcK-YfZ5g2eBd7Ub7lc2E6TSTHD7D9H7JPpVFA 3702
trafilatura/main_extractor.py sha256=cj87JUt95D1zpoPvRHBr-jtPVLFHfEbuEHw4TGVFfX8 27179
trafilatura/meta.py sha256=xWL_twoF85nSzaDM7zRQzvDJ9Da3M2FFptILuQJ39rQ 951
trafilatura/metadata.py sha256=4FJwFS0Xw6zZiY4-zjrekF6-TYduWh4C_cYurrkkIUQ 21713
trafilatura/readability_lxml.py sha256=Ez44lNIsBcSIlmU0cKNunk1ZqSuG0fzSgrKvh6m2yHg 16792
trafilatura/settings.cfg sha256=IuClQnwBQffOXk2j31pVo6V-o5cdnjjZVKTEd9iyQaU 739
trafilatura/settings.py sha256=NQPEiQQOfnQcpLPNYjdmrHo6rWFlNmkvQeKdNtLWx5E 7048
trafilatura/sitemaps.py sha256=dRdMr4BV-SpDhkNvargy8XIp2q-LEXVvupfMHcqChFI 9766
trafilatura/spider.py sha256=DkZqGPqJ9aVS9L95zAzZYpJ2CZNzOe_tOvvIEshqcwc 9013
trafilatura/utils.py sha256=klPcWvBwpteSy0Z3qqPb7c_SnF3d3QsxFKtNQax6TVk 15888
trafilatura/xml.py sha256=ENnnAC6oydwFnPs9GZC-Ab14sAzHwlwJzt9F1fpseZk 22115
trafilatura/xpaths.py sha256=LEW63hXP8MLN7Vj8-foO8hjOAlcIs4ZvmVOla6Q30Tc 16127
trafilatura/data/jt-stopwords-pickle.lzma sha256=T2GLl-oHhSm01jqdiyZb5t--l6ssbHGRJqNyzfaIxQo 851312
trafilatura/data/tei-schema-pickle.lzma sha256=RO3XBQVunKz7IQYn1YQ5Lh6dp-EnMwkJ5GBrcNKoxXk 80692
trafilatura-1.9.0.dist-info/LICENSE sha256=psuoW8kuDP96RQsdhzwOqi6fyWv0ct8CR6Jr7He_P_k 10173
trafilatura-1.9.0.dist-info/METADATA sha256=LiZMPI-hjDEPC7tkHePUbSaadbJWY5-hBN_7XQGFRtw 14114
trafilatura-1.9.0.dist-info/WHEEL sha256=GJ7t_kWBFywbagK5eo9IoUwLW6oyOeTKmQ-9iHFVNxQ 92
trafilatura-1.9.0.dist-info/entry_points.txt sha256=G-TALznoHb9Ad0G2dVyBlvbbRSoMRjY3kNT3bzJeGiw 92
trafilatura-1.9.0.dist-info/top_level.txt sha256=FNlkTX9sAktQsHwwXze9RAexePfOXsqPY9cF86PNlnE 12
trafilatura-1.9.0.dist-info/RECORD

top_level.txt

trafilatura

entry_points.txt

trafilatura = trafilatura.cli:main
trafilatura_gui = trafilatura.gui:main