pdf2dataset

View on PyPIReverse Dependencies (0)

0.5.3 pdf2dataset-0.5.3-py3-none-any.whl

Wheel Details

Project: pdf2dataset
Version: 0.5.3
Filename: pdf2dataset-0.5.3-py3-none-any.whl
Download: [link]
Size: 22037
MD5: 07d9c828d0ab5b588ad27018e77138ee
SHA256: e621254be6193c34e41081b5762c6d0ceda34d26636f69e20196698e3930ecbe
Uploaded: 2020-09-13 04:35:50 +0000

dist-info

METADATA

Metadata-Version: 2.1
Name: pdf2dataset
Version: 0.5.3
Summary: Easily convert a subdirectory with big volume of PDF documents into a dataset, supports extracting text and images
Author: Ícaro Pires
Author-Email: icaropsa[at]gmail.com
Home-Page: https://github.com/icaropires/pdf2dataset
Project-Url: Repository, https://github.com/icaropires/pdf2dataset
License: Apache-2.0
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6,<4.0
Requires-Dist: dask[dataframe] (==2.23.0)
Requires-Dist: more-itertools (<9.0.0,>=8.4.0)
Requires-Dist: opencv-python (==4.4.0.42)
Requires-Dist: packaging (<21.0,>=20.4)
Requires-Dist: pandas (<0.26.0,>=0.25.0)
Requires-Dist: pdf2image (<2.0.0,>=1.13.1)
Requires-Dist: pdftotext (==2.1.5)
Requires-Dist: pyarrow (==1.0.0)
Requires-Dist: pytesseract (==0.3.5)
Requires-Dist: ray (==0.8.7)
Requires-Dist: tqdm (<5.0.0,>=4.41.0)
Description-Content-Type: text/markdown
[Description omitted; length: 14890 characters]

WHEEL

Wheel-Version: 1.0
Generator: poetry 1.0.10
Root-Is-Purelib: true
Tag: py3-none-any

RECORD

Path Digest Size
pdf2dataset/__init__.py sha256=rPycNmqNi2kjnI2shMhjz685mZrLOSWjzg8QS0JB6gw 309
pdf2dataset/__main__.py sha256=EsfAceJ7oIDku888jiAU8YnB7f3h_WILIg4SKDc4_nk 3112
pdf2dataset/extract_task.py sha256=kgVLkXVKXUY48qTkwI3VqO4_bbyvZi1Kgo2A9-1DdxE 6808
pdf2dataset/extraction.py sha256=tTqYUgWMAekwrIuW8ltkwZKVQcxnYuFSrYopgN-QllM 10848
pdf2dataset/extraction_memory.py sha256=Yh-QFAStjpkmOEDIj7siDLiCiOA89E_MpWax_F5S4YM 2186
pdf2dataset/pdf_extract_task.py sha256=I7qmGza5KtalkyGfiRhynp3orMrbFkHDD3IEUrG1Dxs 4385
pdf2dataset/results.py sha256=bQx0AZFEmxZiodYlBMlkNCN-tFzX2V2nGQOCzcY12EE 3141
pdf2dataset/utils.py sha256=gZzl7O9ycVV0wxwmee3LEj5kSGLgna7O1FIizQGoymo 2422
pdf2dataset-0.5.3.dist-info/entry_points.txt sha256=8niWsw0K2NzDkiW68jSBXEpCsQarnTiNOvS-lez1svQ 57
pdf2dataset-0.5.3.dist-info/LICENSE sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ 11357
pdf2dataset-0.5.3.dist-info/WHEEL sha256=Q99itqWYDhV793oHzqzi24q7L7Kdiz6cb55YDfTXphE 84
pdf2dataset-0.5.3.dist-info/METADATA sha256=xDp67-4pmTeOSRG8wSf-PBXNTqUyaMUYiHprts5Btb0 16020
pdf2dataset-0.5.3.dist-info/RECORD

entry_points.txt

pdf2dataset = pdf2dataset.__main__:main