dom-tokenizers

View on PyPIReverse Dependencies (0)

0.0.14 dom_tokenizers-0.0.14-py3-none-any.whl

Wheel Details

Project: dom-tokenizers
Version: 0.0.14
Filename: dom_tokenizers-0.0.14-py3-none-any.whl
Download: [link]
Size: 24217
MD5: 611c8b3c04c6b1346dbc4524c48a326e
SHA256: 58d90d965b4a82830f8725538d214b3d7f52e39513d5d66d8dbb1c6ce46aaa7b
Uploaded: 2024-05-30 20:54:33 +0000

dist-info

METADATA

Metadata-Version: 2.1
Name: dom-tokenizers
Version: 0.0.14
Summary: DOM-aware tokenization for 🤗 Hugging Face language models
Author-Email: Gary Benson <gary[at]gbenson.net>
Project-Url: Homepage, https://github.com/gbenson/dom-tokenizers
Project-Url: Source, https://github.com/gbenson/dom-tokenizers
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing :: Markup :: HTML
Requires-Python: >=3.10
Requires-Dist: python-magic
Requires-Dist: tokenizers
Requires-Dist: unidecode
Requires-Dist: build; extra == "dev"
Requires-Dist: datasets; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Requires-Dist: flake8-custom-import-rules; extra == "dev"
Requires-Dist: flake8-quotes; extra == "dev"
Requires-Dist: pillow; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: transformers; extra == "dev"
Requires-Dist: datasets; extra == "train"
Requires-Dist: pillow; extra == "train"
Requires-Dist: transformers; extra == "train"
Provides-Extra: dev
Provides-Extra: train
Description-Content-Type: text/markdown
License-File: LICENSE
[Description omitted; length: 1143 characters]

WHEEL

Wheel-Version: 1.0
Generator: bdist_wheel (0.43.0)
Root-Is-Purelib: true
Tag: py3-none-any

RECORD

Path Digest Size
dom_tokenizers/__init__.py sha256=5hpYkYozXjJH6aCLLCvogRskaUt4KnM4Z8WMnsFR5nk 52
dom_tokenizers/diff.py sha256=3Nlt1dKAY_VoVETac5wCBxqh0qHsRVrbITGZ7viDcpQ 3614
dom_tokenizers/dump.py sha256=JiOOxlc0sfabMBczyl6mNUWZc3Wh34HgGJkGR410QBI 1667
dom_tokenizers/profile.py sha256=JdtMPM26J-JhIQBYfl9q7dVYsJB93Z651cRtzbc6l4k 3654
dom_tokenizers/train.py sha256=XffPrlPIPr36bOgWtHk1P19fMvU6PlokYq46T0UuK5w 6147
dom_tokenizers/internal/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
dom_tokenizers/internal/json.py sha256=WbsasHTkl7KE_qBgbFRXNaaSiEt9W8ZZwE_0J1ANpvA 429
dom_tokenizers/internal/transformers.py sha256=Z7lkwJV55Afjyz8mVNdP-AzXKU2H1kSg3hGrLhYl8hM 589
dom_tokenizers/pre_tokenizers/__init__.py sha256=o2HCII8oWjL0s0C81abj8wG61nrCwgOe8hKnYSr1swM 50
dom_tokenizers/pre_tokenizers/compat_itertools.py sha256=-V0hUQARHfdW0R0FvOrk_-BYDviuE_TZyvKad0Ec-eQ 296
dom_tokenizers/pre_tokenizers/dom_snapshot.py sha256=Wfr1ck2ggFUXXs6zlh3_J6f4WNsmzmGe893mq_xdLUk 4828
dom_tokenizers/pre_tokenizers/html.py sha256=3xxZIfAjd6Ue4jCTbeVOQ6AE4bGFPKuTwTx4Wd1Kp94 322
dom_tokenizers/pre_tokenizers/pre_tokenizer.py sha256=6mUQ504FgACM1I6Vy0yAuLww__KJheDENph4vdM0EAQ 3011
dom_tokenizers/pre_tokenizers/splitter.py sha256=NjrgYHZ_Dq9vOrLJURdALxN6c6FAIRCzDGvv7O-PZMo 19604
dom_tokenizers/pre_tokenizers/token_buffer.py sha256=4JM8_bPhgGJ1Vy5KAq-8T7AB9E1FADuknlgsl75CsMM 541
dom_tokenizers-0.0.14.dist-info/LICENSE sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ 11357
dom_tokenizers-0.0.14.dist-info/METADATA sha256=VkpZpBl2V1YaYvlsBt4yLU7gkHphTQuU366JUCMKakw 2961
dom_tokenizers-0.0.14.dist-info/WHEEL sha256=GJ7t_kWBFywbagK5eo9IoUwLW6oyOeTKmQ-9iHFVNxQ 92
dom_tokenizers-0.0.14.dist-info/entry_points.txt sha256=NB-NIJOJO5F79vjaUI8wgefqt_Y7Jb1fIGMUjVmhx6w 240
dom_tokenizers-0.0.14.dist-info/top_level.txt sha256=X-UAu4PqdfJbKTG9mxU_38NYLpVn4BTBmEVpDsKvI64 15
dom_tokenizers-0.0.14.dist-info/RECORD

top_level.txt

dom_tokenizers

entry_points.txt

diff-tokenizer = dom_tokenizers.diff:main
dump-tokenizations = dom_tokenizers.dump:main
profile-tokenizer = dom_tokenizers.profile:main
tokenizer-diff = dom_tokenizers.diff:main
train-tokenizer = dom_tokenizers.train:main