CmonCrawl

View on PyPIReverse Dependencies (0)

1.1.8 CmonCrawl-1.1.8-py3-none-any.whl

Wheel Details

Project: CmonCrawl
Version: 1.1.8
Filename: CmonCrawl-1.1.8-py3-none-any.whl
Download: [link]
Size: 54475
MD5: 62de87385fd4bc626258b299d7c52e08
SHA256: 099b69600a9f03e043b34a766702cbbb92a069c005b89ce5f6ac16b1957123c1
Uploaded: 2024-04-07 23:58:01 +0000

dist-info

METADATA

Metadata-Version: 2.1
Name: CmonCrawl
Version: 1.1.8
Project-Url: Source, https://github.com/hynky1999/CmonCrawl
License: MIT License Copyright (c) [2023] [Hynek Kydlíček] Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Keywords: Common Crawl,Crawl,Extractor,Common Crawl Extractor,Web Crawler,Web Extractor,Web Scraper
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: aiofiles (~=23.2.1)
Requires-Dist: aiohttp (~=3.9.3)
Requires-Dist: beautifulsoup4 (~=4.12.3)
Requires-Dist: pydantic (~=2.6.4)
Requires-Dist: stomp.py (~=8.1.0)
Requires-Dist: tqdm (~=4.66.1)
Requires-Dist: warcio (~=1.7.4)
Requires-Dist: aiocsv (~=1.3.1)
Requires-Dist: aioboto3 (~=12.3.0)
Requires-Dist: tenacity (~=8.2.3)
Requires-Dist: python-dotenv (==1.0.0)
Description-Content-Type: text/markdown
License-File: LICENSE
[Description omitted; length: 7227 characters]

WHEEL

Wheel-Version: 1.0
Generator: bdist_wheel (0.43.0)
Root-Is-Purelib: true
Tag: py3-none-any

RECORD

Path Digest Size
cmoncrawl/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
cmoncrawl/config.py sha256=kJQZxDPgnmn2_CFB6aPc4agpn4CGrqlAv_mNlsCJIdc 507
cmoncrawl/aggregator/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
cmoncrawl/aggregator/athena_query.py sha256=dAMWoOjbljnkF23_k6DtE3YKafaTAgu4ZvbiticH_BU 20970
cmoncrawl/aggregator/base.py sha256=v2LEuQXYqyAKjEv9SoePyloiHERJXElsI8I4AjMuVTY 245
cmoncrawl/aggregator/gateway_query.py sha256=dlU5aTj5H9DEyf7pGhHEuVYpRPHszzeF7MZoM-66bFI 13605
cmoncrawl/aggregator/.vscode/settings.json sha256=Lr48UaNGOORVRO2GnwzP1n-C16MUmQTuXbJSSoKrXi8 215
cmoncrawl/aggregator/utils/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
cmoncrawl/aggregator/utils/athena_query_maker.py sha256=OIE-RHDyldHlfKpSn-rMv4SL9uwY-gjJioD1LWIn6DU 4593
cmoncrawl/aggregator/utils/constants.py sha256=V8q48K742JJW8O5344F9XW7KsmRJ_HDdW3Po1nf-CZw 66
cmoncrawl/aggregator/utils/helpers.py sha256=UGEfiYJJNgXJsTCQHeN8qT0HSKrAqUiN4xuJCo6XCMM 7867
cmoncrawl/aggregator/utils/ndjson.py sha256=RqQcf_Zkr-aSYokbzPDAZFMK40VIvfKwH__cHLgRXTA 209
cmoncrawl/common/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
cmoncrawl/common/caching.py sha256=7kZc536m0yNV-T6aFan5fbz7Y5kTeoFOV6qiKxkA22Y 1920
cmoncrawl/common/loggers.py sha256=x_HFlK4__n89ImZEGq0GFjW90Dsau6_ubagFIKCJiWU 1311
cmoncrawl/common/throttling.py sha256=rkEB-tstoYHwm9ztGUyI9s1u_OfFiNtdzFo7LDorAfg 1504
cmoncrawl/common/types.py sha256=uBD72M1_Ts6OHvdHSTcAtvrPXrVNO9er4Vs-5X65c2c 3783
cmoncrawl/integrations/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
cmoncrawl/integrations/commands.py sha256=x-V7xEfDhrRYt_CThAKRsMyw9DrZNFVG4PO5aRKU9As 1445
cmoncrawl/integrations/download.py sha256=60KgjP6KPBUfJH_zBWWh72oXpgOKQnx8pZcGacFEpxw 11076
cmoncrawl/integrations/extract.py sha256=oilVgx8AUFey9GeZqHs-pjdQ1svNvDQKfebpeMySWyM 8749
cmoncrawl/integrations/utils.py sha256=N3McdCHnJ9RxBIt6CoJxoKoGCcDY9iAmHaag0kbYzKU 526
cmoncrawl/middleware/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
cmoncrawl/middleware/stompware.py sha256=UzHOrMPYDHL7GojOabRNJP7ghkUdEDlG2QycXO6o3bE 12205
cmoncrawl/middleware/synchronized.py sha256=yjSJvF_zG9EAcWRNhfjVaKeo0As63ZsdMmsgKTX6Sko 4442
cmoncrawl/processor/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
cmoncrawl/processor/dao/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
cmoncrawl/processor/dao/api.py sha256=RAYk5-7P5HISSWwgjfPahAh2bAIprbeCerY6qjq4tBw 2887
cmoncrawl/processor/dao/base.py sha256=DRtZyY8lFq9tgzoGekeVcR-uluK5giaLIEN6wwurZOA 816
cmoncrawl/processor/dao/s3.py sha256=97Ehn9fzwMV4TCmnEryWiqTRrPk_B_fxSvlA4fKgUTM 3570
cmoncrawl/processor/extraction/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
cmoncrawl/processor/extraction/filters.py sha256=gt99i_neMEcoSs_yUttD2RfVu-B26Ey-NXs75pwLt18 1124
cmoncrawl/processor/extraction/utils.py sha256=3R4e8qAK7BiwRtc0huYRIoMBGp1BnliUTHES_dr7_is 8380
cmoncrawl/processor/pipeline/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
cmoncrawl/processor/pipeline/downloader.py sha256=k4wmZJzgpw1ZC4Q2hUzZ4uWTsidnTkOE8GKJhMZVf0U 11927
cmoncrawl/processor/pipeline/extractor.py sha256=llGLwkyMmObZtEeFJd2hIdsu6xwFT19PZNMknW3Fed4 13281
cmoncrawl/processor/pipeline/pipeline.py sha256=jxhVkn2z3vJq7NyykE-bEI6gcfXekqUk1GHM0sQ4MPg 1948
cmoncrawl/processor/pipeline/router.py sha256=ANT_me4_Cd-BDfVs2zlP4wzioLz3_5IRoPxY3E-gX-w 5742
cmoncrawl/processor/pipeline/streamer.py sha256=H3WWHcbGu1qkbUbUrmb2cdKMqvJadvVIx23uKw-rJxs 7367
CmonCrawl-1.1.8.dist-info/LICENSE sha256=uWn0XPs_KQZd_8rGHvrV8uw1mOpRCdIscjomaSe8YqM 1077
CmonCrawl-1.1.8.dist-info/METADATA sha256=u394Rpj4IFQ26gfldPL5KMotTGczDnaiWOiug0EzcYY 9310
CmonCrawl-1.1.8.dist-info/WHEEL sha256=GJ7t_kWBFywbagK5eo9IoUwLW6oyOeTKmQ-9iHFVNxQ 92
CmonCrawl-1.1.8.dist-info/entry_points.txt sha256=Fb-xMxtDAmMm2ixKKZV39Klr6FAyQLa9ZxwmuhWfZqU 62
CmonCrawl-1.1.8.dist-info/top_level.txt sha256=0zFgyyB9ZoN88Kgpxg7cy2p79IeSqY-T_oyniQT3jDU 10
CmonCrawl-1.1.8.dist-info/RECORD

top_level.txt

cmoncrawl

entry_points.txt

cmon = cmoncrawl.integrations.commands:main