webcorpus

View on PyPIReverse Dependencies (0)

0.2 webcorpus-0.2-py3-none-any.whl

Wheel Details

Project: webcorpus
Version: 0.2
Filename: webcorpus-0.2-py3-none-any.whl
Download: [link]
Size: 55136
MD5: bb6a8523b030d07d2a65a46e7c6b79af
SHA256: e21533ef788ed23f13b29947ee4c93a32e4f7d890b13c6d97ff49f9e0432b36c
Uploaded: 2021-03-18 21:48:21 +0000

dist-info

METADATA

Metadata-Version: 2.1
Name: webcorpus
Version: 0.2
Summary: Generate large textual corpora for almost any language by crawling the web
Author: Divyanshu Kakwani
Author-Email: divkakwani[at]gmail.com
Home-Page: https://github.com/divkakwani/webcorpus
Project-Url: Bug Reports, https://github.com/divkakwani/webcorpus/issues
Project-Url: Source, https://github.com/divkakwani/webcorpus
Keywords: dataset corpus
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Requires-Python: >=3.5
Requires-Dist: morfessor
Requires-Dist: boilerpipe3
Requires-Dist: tldextract
Requires-Dist: click
Requires-Dist: scrapy
Requires-Dist: tqdm
Requires-Dist: pandas
Requires-Dist: scrapyd
Requires-Dist: nltk
Requires-Dist: scrapyd-client
Requires-Dist: htmldate
Description-Content-Type: text/markdown
[Description omitted; length: 1812 characters]

WHEEL

Wheel-Version: 1.0
Generator: bdist_wheel (0.33.1)
Root-Is-Purelib: true
Tag: py3-none-any

RECORD

Path Digest Size
webcorpus/__init__.py sha256=Xe6moGpfqhsuxQccCjpvU7-GaBQkRCfKAbuyiGbf6JI 22
webcorpus/cli.py sha256=3F-5wvhfUpqZfMqTREpDntsP0tENPBQcDHCv-aMlNFE 2762
webcorpus/sources.py sha256=rMR8rxASzVbSV4a_3kJ4dnPBC0MnYovQ2GGqWoe6d0E 2438
webcorpus/utils.py sha256=w22ZIpZQyeN0s2g0oiSbsTNaqYRAyheVC7ki848RVOA 758
webcorpus/corpus/__init__.py sha256=TxLnoMqHQe5UfkeSBoGjOWXMyk37r5FAmdqBD6-CpxI 2941
webcorpus/crawlers/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
webcorpus/crawlers/news.py sha256=xkLiqBb1e64JK1wS4El-gGg7FpZsthLk_-AqMMBquX4 5567
webcorpus/crawlers/settings.py sha256=kJwbIuPa8sBcl8dTN6In7D002j7LuWWnWwDhEWVODIg 682
webcorpus/crawlers/w3newspaper.py sha256=p12rkgcRukTS3VmhfaTu1flQjLe9WIxAFFT1cJXuIVg 1501
webcorpus/language/__init__.py sha256=Q2FJVtMuKZ08PIZDkUahIk9yIk7RiYU7Jm15_LIzxTo 1896
webcorpus/language/itrans_transliterator.py sha256=mQI1CzcVSUjl2DZoZbzx9gombN7gZyq6NJO_xIxgpxY 32281
webcorpus/language/langinfo.py sha256=SGHpscrGIEJqXnTvyMxstzS3bm9QWd8GUSW72rFcekk 5215
webcorpus/language/normalize.py sha256=60KbH5ZY5z2rv9i6LmP9-bhPefMcR3oyqhPP1eJbl40 26060
webcorpus/language/sentence_tokenize.py sha256=5TKjL8eW_Dzi0ghqFbBbSFq5kjxOwo9q2YjRMKaYy84 3946
webcorpus/language/sinhala_transliterator.py sha256=qY1rK2eFku1tsfTbA3kUdzps_UoH5p9Ud4bufhC_98U 4650
webcorpus/language/tokenize.py sha256=vl224f8Mh-vckxDPsNkZI0pLaahCMc56sP-3MFpFzZI 1989
webcorpus/language/unicode_transliterate.py sha256=4hqU00BEtmFZVaYHDPBnC9wJcFpoYd-v5hSTNI7PrYE 5010
webcorpus/processors/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
webcorpus/processors/agcsent.py sha256=cWJzrjbyMxglVWKpsrDiF06Wl2mx6Ri9FtRLBJSamd8 2090
webcorpus/processors/annot_sent.py sha256=5LQS6ijI_t5LieD10hFr7Qpy7kc6mFRcZDQDYgpDEBM 2381
webcorpus/processors/arts.py sha256=RkyUUqngeOaR99a2A0gyXt7TWbm7hqDkqagYzNKrDxA 2256
webcorpus/processors/artsfile.py sha256=CL876wj8zyoKM96sCodkbaonhh9clo_U-v3f49IeNU0 2909
webcorpus/processors/datedarts.py sha256=_nQIXSW-VJ9prl1OaywNZuiFlo6N1oQxCEOtwLZuGvM 4063
webcorpus/processors/headline-pred.py sha256=EYPjl4Zrgbjrqq7Mr-fcO4e9Anb4ItsZ4QiNoeu773A 10231
webcorpus/processors/sent.py sha256=Mq0-B8wvpaMBHvAqRMyp2U4eEsjGU30R2vbEpb-CCPk 2770
webcorpus/processors/tokenize.py sha256=2gxGYt9uf61WbM3ssw2Xe8OVLA4RklhehFvRXE4dM-8 1468
webcorpus/processors/topic.py sha256=0zbi5p0LbfDaAIh1Qqj8_1DWxQunHFP1Ahsz5xU_zk4 1986
webcorpus-0.2.dist-info/LICENSE sha256=eflPtRYU6x2TECWBddOLeZ17Xdwx8RkpS2RCyjcUEvE 35234
webcorpus-0.2.dist-info/METADATA sha256=FLU7VtU-UkBwBkVVosPZAMJensia7QJiMbSSmnAIqps 2949
webcorpus-0.2.dist-info/WHEEL sha256=U88EhGIw8Sj2_phqajeu_EAi3RAo8-C6zV3REsWbWbs 92
webcorpus-0.2.dist-info/entry_points.txt sha256=DM61hqjF_reTV8fS8bzN6uUzuPpCx1yp7r0fi6twYzU 94
webcorpus-0.2.dist-info/top_level.txt sha256=xOSGTTLpfMqHRUwZ_Yr4wDBZNIIKSv7UXlAzvZOOknY 10
webcorpus-0.2.dist-info/RECORD

top_level.txt

webcorpus

entry_points.txt

webcorpus = webcorpus:cli
[scrapy]
settings = webcorpus.crawlers.settings