Wheelodex — tika — Reverse Dependencies

Wheelodex » Projects » tika » Reverse Dependencies

Reverse Dependencies of tika

The following projects have a declared dependency on tika:

azure-ai-generative — Microsoft Azure Machine Learning Client Library for Python
azureml-rag — Contains Retrieval Augmented Generation related utilities for Azure Machine Learning and OSS interoperability.
banrep — Analítica de Texto en el Banco de la República
bizextract — Simple script for extracting business data from PDFs.
camai-utils — Python utils for the Camai CHC COVID Datasystem.
cbc-nlp — Simplify NLP pre-processing.
cdptools — Tools to interact with and deploy CouncilDataProject instances
coursebox — A course management system currently used at DTU
data-alchemy — Package to process documents of any format
doc-extractor — no summary
doc2map — Beautiful and interactive visualisations for NLP Topics
doctext — no summary
doms_databasen — Scraper and PDF text processor for domsdatabasen.dk
download-aptnotes — Download and (optionally) parse APTNotes quickly and easily
extractify — no summary
faker-file — Generate files with fake data.
farm-haystack — LLM framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data.
genutility — A collection of various Python utilities
geoso — A Python package for spatio-temporal analysis of social media contents
harmonydata — Harmony Tool for Retrospective Data Harmonisation
hiphopscrap — fetch, munge, and parse résumés and job postings
invenio-files-processor — Invenio module for files' processing and or transforming.
JacksonQuery — Automated interaction and data extraction tool for Jackson National Life Insurance website
langsearch — Easily create semantic search based LLM applications on your own data
nlm-ingestor — Parsers and ingestors for different file types and formats
noba-mauve — Unit test your writing
OCRUSREX — OCRUSREX takes a PDF (either by path or as a file-like object) and makes it searchable using Tesseract 4. It has an enterprise-friendly license.
pvqa — Question Answering System for Plants
pyHooke — Open source plagiarism checker
querent — The Asynchronous Data Dynamo and Graph Neural Network Catalyst
reader-toolbox — A command-line interface for creating and interacting with Distant Reader data sets (a.k.a. study carrels)
resume-classification — It a simple package for training and classification of resumes.
resume-parser — A resume parser used for extracting information from resumes
resume-parser-upd — A resume parser used for extracting information from resumes
simple-pdf2text — A small package to extract text from pdf
simpledms — no summary
symbolicai — A Neuro-Symbolic Framework for Python
tikatree — Directory tree metadata parser using Apache Tika
tocPDF — A bookmark generator for pdf
tum-gdpr-folder-scanner — Script to check local folders for GDPR-relevant information in the TUM context.
twisp — no summary
txtai — All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
utilsovs-pkg — Utils derived from the O-GlcNAc Database source code