Reverse Dependencies of tika
The following projects have a declared dependency on tika:
- azure-ai-generative — Microsoft Azure Machine Learning Client Library for Python
- azureml-rag — Contains Retrieval Augmented Generation related utilities for Azure Machine Learning and OSS interoperability.
- banrep — Analítica de Texto en el Banco de la República
- bizextract — Simple script for extracting business data from PDFs.
- camai-utils — Python utils for the Camai CHC COVID Datasystem.
- cbc-nlp — Simplify NLP pre-processing.
- cdptools — Tools to interact with and deploy CouncilDataProject instances
- coursebox — A course management system currently used at DTU
- data-alchemy — Package to process documents of any format
- doc-extractor — no summary
- doc2map — Beautiful and interactive visualisations for NLP Topics
- doctext — no summary
- doms_databasen — Scraper and PDF text processor for domsdatabasen.dk
- download-aptnotes — Download and (optionally) parse APTNotes quickly and easily
- extractify — no summary
- faker-file — Generate files with fake data.
- farm-haystack — LLM framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data.
- genutility — A collection of various Python utilities
- geoso — A Python package for spatio-temporal analysis of social media contents
- harmonydata — Harmony Tool for Retrospective Data Harmonisation
- hiphopscrap — fetch, munge, and parse résumés and job postings
- invenio-files-processor — Invenio module for files' processing and or transforming.
- JacksonQuery — Automated interaction and data extraction tool for Jackson National Life Insurance website
- langsearch — Easily create semantic search based LLM applications on your own data
- nlm-ingestor — Parsers and ingestors for different file types and formats
- noba-mauve — Unit test your writing
- OCRUSREX — OCRUSREX takes a PDF (either by path or as a file-like object) and makes it searchable using Tesseract 4. It has an enterprise-friendly license.
- pvqa — Question Answering System for Plants
- pyHooke — Open source plagiarism checker
- querent — The Asynchronous Data Dynamo and Graph Neural Network Catalyst
- reader-toolbox — A command-line interface for creating and interacting with Distant Reader data sets (a.k.a. study carrels)
- resume-classification — It a simple package for training and classification of resumes.
- resume-parser — A resume parser used for extracting information from resumes
- resume-parser-upd — A resume parser used for extracting information from resumes
- simple-pdf2text — A small package to extract text from pdf
- simpledms — no summary
- symbolicai — A Neuro-Symbolic Framework for Python
- tikatree — Directory tree metadata parser using Apache Tika
- tocPDF — A bookmark generator for pdf
- tum-gdpr-folder-scanner — Script to check local folders for GDPR-relevant information in the TUM context.
- twisp — no summary
- txtai — All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
- utilsovs-pkg — Utils derived from the O-GlcNAc Database source code
1