bisheng-unstructured

View on PyPIReverse Dependencies (0)

0.0.3.post4 bisheng_unstructured-0.0.3.post4-py3-none-any.whl

Wheel Details

Project: bisheng-unstructured
Version: 0.0.3.post4
Filename: bisheng_unstructured-0.0.3.post4-py3-none-any.whl
Download: [link]
Size: 1376060
MD5: d9ab98617d098aa84e4321209b29e937
SHA256: d22a41a12a01b833730afc4f1af625dc4e5afd35d6891e477f7f51147b08a3fd
Uploaded: 2024-03-21 09:33:31 +0000

dist-info

METADATA

Metadata-Version: 2.1
Name: bisheng-unstructured
Version: 0.0.3.post4
Summary: ETLs fro LLMs
Author: DataElem
Author-Email: contact[at]dataelem.com
Home-Page: https://github.com/dataelement/bisheng-unstructured
License: Apache 2.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Requires-Dist: chardet (==5.1.0)
Requires-Dist: filetype (==1.2.0)
Requires-Dist: python-magic (==0.4.27)
Requires-Dist: nltk (==3.8.1)
Requires-Dist: tabulate (==0.9.0)
Requires-Dist: requests (==2.31.0)
Requires-Dist: urllib3 (==1.26.16)
Requires-Dist: beautifulsoup4 (==4.12.2)
Requires-Dist: emoji (==2.8.0)
Requires-Dist: lxml (==4.9.3)
Requires-Dist: python-docx (==0.8.11)
Requires-Dist: numpy (==1.24.4)
Requires-Dist: pandas (==2.0.3)
Requires-Dist: python-dateutil (==2.8.2)
Requires-Dist: pytz (==2023.3)
Requires-Dist: six (==1.16.0)
Requires-Dist: tzdata (==2023.3)
Requires-Dist: ebooklib (==0.18)
Requires-Dist: importlib-metadata (==6.8.0)
Requires-Dist: markdown (==3.4.4)
Requires-Dist: zipp (==3.16.2)
Requires-Dist: msg-parser (==1.2.0)
Requires-Dist: olefile (==0.46)
Requires-Dist: pypandoc (==1.11)
Requires-Dist: pdf2image (==1.16.3)
Requires-Dist: pdfminer-six (==20221105)
Requires-Dist: pdfplumber (==0.10.2)
Requires-Dist: wheel (==0.41.0)
Requires-Dist: pypdfium2 (==4.23.1)
Requires-Dist: PyMuPDF (==1.23.2)
Requires-Dist: opencv-python (==4.8.0.76)
Requires-Dist: certifi (==2023.7.22)
Requires-Dist: cffi (==1.15.1)
Requires-Dist: charset-normalizer (==3.2.0)
Requires-Dist: contourpy (==1.1.0)
Requires-Dist: cryptography (==41.0.3)
Requires-Dist: cycler (==0.11.0)
Requires-Dist: fonttools (==4.42.1)
Requires-Dist: idna (==3.4)
Requires-Dist: scipy (==1.10.1)
Requires-Dist: shapely (==2.0.1)
Requires-Dist: pydantic (==1.10.12)
Requires-Dist: pillow (==10.0.0)
Requires-Dist: python-pptx (==0.6.21)
Requires-Dist: xlsxwriter (==3.1.2)
Requires-Dist: et-xmlfile (==1.1.0)
Requires-Dist: openpyxl (==3.1.2)
Requires-Dist: xlrd (==2.0.1)
Requires-Dist: uvicorn
Requires-Dist: fastapi
Requires-Dist: orjson
Description-Content-Type: text/markdown
[Description omitted; length: 1972 characters]

WHEEL

Wheel-Version: 1.0
Generator: bdist_wheel (0.43.0)
Root-Is-Purelib: true
Tag: py3-none-any

RECORD

Path Digest Size
bisheng_unstructured/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
bisheng_unstructured/__version__.py sha256=SI019rW6paHw93e6fOWFzF9TruLom8o9HrgZsjGZvaE 42
bisheng_unstructured/logger.py sha256=m1lBAhW03cOPiht2JC_v_N1-zYbzmkh0THBSbAvrehg 480
bisheng_unstructured/utils.py sha256=cuvZApszg7zXqMtJuCz2xePmtK1n8pWFFeed_QWzRU4 2311
bisheng_unstructured/api/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
bisheng_unstructured/api/any2pdf.py sha256=SitpE1RTqw0gFdGotMIY7kuVNVaBTqd860VmgbaDNOY 1180
bisheng_unstructured/api/main.py sha256=xYrrPrEJznAh7yG9NrWAc35bGBTbzXyhmEpY8O4X95I 4077
bisheng_unstructured/api/pipeline.py sha256=bxDcO83-gATdi8DOAU8JWBy5LlBrNZENubd8pBx6Zx0 5072
bisheng_unstructured/api/types.py sha256=uUW9qcK7yazrOk9YMNfsJeFXnP_8IkgE746A42ZTe84 736
bisheng_unstructured/cleaners/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
bisheng_unstructured/cleaners/core.py sha256=g7LFNzfseQpE5a8our6seOllc-eemjYZdgz_JgbtS7w 12157
bisheng_unstructured/cleaners/extract.py sha256=EF9lOW-_us9dM246iRnggf1baU87JMtXCYHCCqBS5A0 4217
bisheng_unstructured/cleaners/translate.py sha256=vnNnnADG9OeDICEtjt0EuG6YbeodIZm1DKyJcEIpDks 3266
bisheng_unstructured/common/__init__.py sha256=a6TM--akZOnMX3UGOlPAZkfHNRwEbqTOAlLeYaikfrY 91
bisheng_unstructured/common/logger.py sha256=tB4dII9H0jjoDSZGxxBk3Lvg8d180m_yqP-vgxNHuLc 2397
bisheng_unstructured/common/timer.py sha256=aPosR8NshT4dgVGEEgs_D-PzKXglGcKGoab1YsPND_s 504
bisheng_unstructured/documents/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
bisheng_unstructured/documents/base.py sha256=_jQATYM6slnIZKPJsAWGBGZtiFVTy8tqVuMa81rKWYw 3339
bisheng_unstructured/documents/coordinates.py sha256=bHHs_PrExXCTxJaB24-m0XgXj6GuPLruCoBWw1w-Hjs 3331
bisheng_unstructured/documents/elements.py sha256=1DoWkb0WE9svqsO_VfDU3kyDgNtOCSOYdzGo-hRQQbc 16647
bisheng_unstructured/documents/email_elements.py sha256=y6__1gMYP0GRSyyulDCxFhuEKeiD7D0apPsez0qZ74Y 3462
bisheng_unstructured/documents/html.py sha256=_xNi0vtjWlqc7L2lusRlCeLdtOruijWcE7gxrRmEhq0 25660
bisheng_unstructured/documents/html_utils.py sha256=N9QDbhP4cc9EchL_PPnSCSD3FwnrE6EQgAyb5LCez_w 2696
bisheng_unstructured/documents/layout.py sha256=_l200wS143lgQH0-NIROOKIgN7RKntvM_IlxWzZrv50 4830
bisheng_unstructured/documents/markdown.py sha256=NszYNowZUU2zwboEMQc4vQKL6TQNr2BdcJiK8aafkBA 4522
bisheng_unstructured/documents/xml.py sha256=F7zMSG3DJ2tu4eC6XMTsUnBjSMLkZuzLkDlHrdrDxuc 5320
bisheng_unstructured/documents/pdf_parser/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
bisheng_unstructured/documents/pdf_parser/blob.py sha256=gpn25dfnzCzTJDQatErznMGQfYa_FXVO3F-ghZ2upeQ 5804
bisheng_unstructured/documents/pdf_parser/image.py sha256=Zq6OtVhx4pSNx-tOW7jH4rzpCpgAXI65vHwJDahJZ-w 2466
bisheng_unstructured/documents/pdf_parser/pdf.py sha256=n6B1I5vE2f7OevkrINO0sk9PHhAs4m4FE3twwkAPOEc 44205
bisheng_unstructured/documents/pdf_parser/pdf_creator.py sha256=XB8nNGHbIrppacenouZ9I1Uz3Pixoq3-HfHAuQvekbM 10690
bisheng_unstructured/documents/pdf_parser/test_pdf.py sha256=W-U934-pOgBH9bywiVq_TPkKE4Rql0pMNIiAgDiWgl8 10184
bisheng_unstructured/file_utils/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
bisheng_unstructured/file_utils/encoding.py sha256=3h0c8_1Zb6XX37km_k4y6371ro-uwJUdktFPJ-bELbM 4421
bisheng_unstructured/file_utils/exploration.py sha256=Ke3ulNuoFeVi3V5dEx6fw3SSOFlQHseZ9JWUV2XFvgs 2406
bisheng_unstructured/file_utils/file_conversion.py sha256=DS-mvlEX6dS0bmPkBb81NjWZjc9qBb73pMULabMGzuQ 1741
bisheng_unstructured/file_utils/filetype.py sha256=g6acfjHJJF5OZ69ufPxyxwoltpZOyxzqpsvLHSF45ps 20903
bisheng_unstructured/file_utils/google_filetype.py sha256=YVspEkiiBrRUSGVeVbsavvLvTmizdy2e6TsjigXTSRU 468
bisheng_unstructured/file_utils/metadata.py sha256=ImZEGMz--bzKiUOh4zGBbMgRT5gl2QFiyBR1mIPeh8E 5420
bisheng_unstructured/models/__init__.py sha256=i-276lke3VISA83Shb6Z6LIZK1WYa2QJVjUw9b1b_RA 247
bisheng_unstructured/models/common.py sha256=ylfJ2dBeIkm7ATOFBMEiXmw5pX69AKtxXtKWF-QB1Pk 11224
bisheng_unstructured/models/formula_agent.py sha256=2qAFrIY165z9cj_Gx-Qn90ZKCrgKDfqJ5G-0k1G1ALQ 7741
bisheng_unstructured/models/layout_agent.py sha256=IC5RSTpR2kwPdQJxEDvv1jecTU3NMznW8P8WMapeyHA 828
bisheng_unstructured/models/ocr_agent.py sha256=9XRcXmH95m0TDjMZ29SvZQOnM_uNYjKG_6I0ZE1DXOk 7360
bisheng_unstructured/models/table_agent.py sha256=dIcY6i1olUOfqBjFW_Wv8vo4pCEH8kLfllGdNsfAs3o 1814
bisheng_unstructured/nlp/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
bisheng_unstructured/nlp/english-words.txt sha256=8fpk2f3iMm87qMppZMFAt1eWJcqOMALtt1YHE-fm7bY 4472047
bisheng_unstructured/nlp/english_words.py sha256=Ng2ozKrwF0Pw-qblYtBxxFOW9hT0eVL5uLqEgf0BHsw 701
bisheng_unstructured/nlp/partition.py sha256=3IVR2x1hJ-ZSu-Q4E83quuC7OQsWOXBq_1WS7j8MufA 210
bisheng_unstructured/nlp/patterns.py sha256=5Jy9rJjOxo_sdgzsw5Ynx0tPdupiESEacoJj2tko74s 6060
bisheng_unstructured/nlp/tokenize.py sha256=sg6yvkSHpZAIjzlKOdXiPh4qbxvjPmelfJFEANxMg20 2090
bisheng_unstructured/partition/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
bisheng_unstructured/partition/api.py sha256=d-E5dOsE--r2UH2z4V5EOfm9p2yP8W6ZMUNms3-h8iY 6626
bisheng_unstructured/partition/auto.py sha256=nwks8qe1LHyv4-i9pW-M3LHDKkqTEN1HFDKgoRffIYo 12711
bisheng_unstructured/partition/common.py sha256=DvXIvfz_A-Mua9k_WwtF1QJqumGa76rEFwy92JqqKns 12239
bisheng_unstructured/partition/csv.py sha256=xdZr2o7yHGyLqrFGNtu3fV5fWt-1CZ1wPkb6nFYLTC0 2297
bisheng_unstructured/partition/doc.py sha256=_PeVepRM_8Ryi3nNneVuHLh9xZ9thvexkcPL6J7Ounc 3078
bisheng_unstructured/partition/docx.py sha256=dUClTq27NkG-ohx51lamzLHi4k-o_qtFkTRWQXiSIjA 16608
bisheng_unstructured/partition/email.py sha256=sFDAj_51oQWO1SuA9bDPSZsJYtXm5W0JyA8GKDRxdhE 15679
bisheng_unstructured/partition/epub.py sha256=ROmTLpVdrrvSZdPppXm8X8hSoHfNnQVYwVIFIb-sgTE 3363
bisheng_unstructured/partition/html.py sha256=0Xndc_zHiCPZy4PMqy2QSbRgnwELFyLcA7-e6XkXMR8 6626
bisheng_unstructured/partition/image.py sha256=DT0Zg4eX3HfzUbpDiujN66LogLsSsEkzFJOHp2VhFYc 2394
bisheng_unstructured/partition/json.py sha256=R1hfHhGTAmv6hUsKhrur-2f2st4zL3BuzEBMXr7tWko 2559
bisheng_unstructured/partition/md.py sha256=6ywatKrR9fYX2CerYl_RhsxxlWiKGZwNXKtqgtmcVXs 3214
bisheng_unstructured/partition/msg.py sha256=8AjOjc4GnNi9uxT39cc9nyJx053opStABsuMqWMtjfc 6247
bisheng_unstructured/partition/odt.py sha256=FcgauUvKat_yH63_c_4pIp9mP6HRUlcIFqfzLvcVDe4 1516
bisheng_unstructured/partition/org.py sha256=A79O3bvWhXSDT3Crxv12NCilBl8J0KiW5NRFYphLWWw 1344
bisheng_unstructured/partition/pdf.py sha256=gP5hSBUqdIJ3EjJBeeXIdLyMughUNt05uiLyO_1h5D0 18165
bisheng_unstructured/partition/ppt.py sha256=UjYnFegTUz9US130KRDDLeRrZ2bikCO0T4YjxW-HTuc 2827
bisheng_unstructured/partition/pptx.py sha256=LkMb3ypDGpu_7d9e_00-LF08Db7hr8Zrm4skunq-CLk 7241
bisheng_unstructured/partition/rst.py sha256=p6PW0XdUiX5MJnuLyBjpLM7lxW-lpUZTkpSHQ65JTQM 1397
bisheng_unstructured/partition/rtf.py sha256=61iJPVGWOCg4lWi7uCvtsVvevAE6lA6HoEH8nfMvvPI 1396
bisheng_unstructured/partition/strategies.py sha256=HnomzZbXOe7kKcxMTG8-PnKr_yY3yEMCXB5q2tKbj9s 4797
bisheng_unstructured/partition/text.py sha256=XuO-VgsAvcEJDRe0taVXOruxEs6QdpB6qnduvZuQH6M 9377
bisheng_unstructured/partition/text_type.py sha256=5LsTBWusz6TTuN6ollLN96mKkpguSXPib7FCKXMwieM 12037
bisheng_unstructured/partition/tsv.py sha256=hAsLuGRwUbGLsTfE-G4fMIj6HePv_bI-Fkr-vNUZMF8 2238
bisheng_unstructured/partition/xls.py sha256=ozRmibxtAkl5m1UKHKG9Qi68nUazagbVRJ5bpCbQSq0 3891
bisheng_unstructured/partition/xlsx.py sha256=9fYpt7iLu-ivlYkzr8tWe5ku39oYq7O9F7f7fNh0RYc 2895
bisheng_unstructured/partition/xml.py sha256=s0yoxZ14Vicq8nNcx-DpQsvlOX3ALkgoo3adddy64YY 4203
bisheng_unstructured/staging/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
bisheng_unstructured/staging/argilla.py sha256=QrurufbHV7lNwHr-GIL_OC8xnx2LL2ZSsO7UcTjt4Y4 2304
bisheng_unstructured/staging/base.py sha256=TRxD7WdJcRq20f5ocJMOQ-7VYDl-BMQBiizlhGeUZw4 6982
bisheng_unstructured/staging/baseplate.py sha256=U0EtBWlj88tRlrHpHGke4ZzlyYEskygyHes7mQ_Wzf0 1771
bisheng_unstructured/staging/datasaur.py sha256=P2r9X8Q9jQyRyRkAuDUCnwWKQXuZEGEbxnWksI_Y-h0 1425
bisheng_unstructured/staging/huggingface.py sha256=xo0uZCr0qpnV56v2c_r0etA1Z3k_scoIMDfSPAoTlAY 3846
bisheng_unstructured/staging/label_box.py sha256=U-4syAcaAKG35VOnVlze9nqWBhS0b6MBfBhUEnQRl1E 3924
bisheng_unstructured/staging/label_studio.py sha256=zyW6TUxwEsn-IK9KVnIDUsuepOT5mLfzeZsPfeO1Kr4 4912
bisheng_unstructured/staging/prodigy.py sha256=mPrB1WVcyQTRyy-cHIWNqeQLD4stteXyH716pxDD8AA 3196
bisheng_unstructured/staging/weaviate.py sha256=05ZFzIWIBGT40DurHW1GjoPA3t2f7QNJ-blTZka0tLw 2542
bisheng_unstructured/topdf/__init__.py sha256=TCxvUghuGIQnfzWucrfoP4ctwemyojKc5IX_M0PRB6c 221
bisheng_unstructured/topdf/docx2pdf.py sha256=M-uvDtAL05NBfvxwa3SUGFhyP9qKZ27x2vpBD1QZhwE 3116
bisheng_unstructured/topdf/excel2pdf.py sha256=wrFFcOdaU3RLzAnnIQRvfITce5FdrW41xum-c2WfeHE 3098
bisheng_unstructured/topdf/pptx2pdf.py sha256=TC0dlnS2hSlurB3sgycNt3to7s6hcd-hQtwGWg0Mv3I 1431
bisheng_unstructured/topdf/text2pdf.py sha256=hXCWpNHCQn2Jr_c8DhaEzfg7rX5k9T5SKGMhjsK88jo 4939
bisheng_unstructured-0.0.3.post4.dist-info/LICENSE sha256=QwcOLU5TJoTeUhuIXzhdCEEDDvorGiC6-3YTOl4TecE 11356
bisheng_unstructured-0.0.3.post4.dist-info/METADATA sha256=6ASwx75YINYk0dcjgCVZp0PD_O-1zYgXB3SXw_fx_X8 4183
bisheng_unstructured-0.0.3.post4.dist-info/WHEEL sha256=GJ7t_kWBFywbagK5eo9IoUwLW6oyOeTKmQ-9iHFVNxQ 92
bisheng_unstructured-0.0.3.post4.dist-info/top_level.txt sha256=vmOwl4nCD_vKBW37L4h45pzaOtFWn435-Cq09smJngE 21
bisheng_unstructured-0.0.3.post4.dist-info/RECORD

top_level.txt

bisheng_unstructured