Reverse Dependencies of datasketch
The following projects have a declared dependency on datasketch:
- cocoon-data — Cocoon is an open-source project that aims to free analysts from tedious data transformations with LLM.
- datamart-isi — USC ISI implementation of D3M Datamart API
- findopendata — A search engine for Open Data.
- galactic-ai — Curate, annotate, and clean massive unstructured text datasets for machine learning and AI systems.
- guanciale — Grab information needed by Carbonara
- hf-clean-benchmarks — This repository contains code for cleaning your training data of benchmark data to help combat data snooping.
- HogProf — Phylogenetic Profiling with OMA and minhashing
- Lotte — Lotte is a tool for quotation detection in texts and can deal with common properties of quotations, for example, ellipses or inaccurate quotations.
- nlp-dedup — Remove duplicates and near-duplicates from text corpora, no matter the scale.
- PolyDeDupe — no summary
- pyoma — library to interact and build OMA hdf5 files
- qbindiff — QBindiff binary diffing tool based on a Network Alignment problem
- Quid — Quid is a tool for quotation detection in texts and can deal with common properties of quotations, for example, ellipses or inaccurate quotations.
- scandi-reddit — Construction of a Scandinavian Reddit dataset.
- scikit-fingerprints — Library for effective molecular fingerprints calculation
- Sketch — Compute, store and operate on data sketches
- squeakily — A library for squeakily cleaning and filtering language datasets.
- tiny-elephant — In memory based collaborative filtering
1