Reverse Dependencies of sentencepiece
The following projects have a declared dependency on sentencepiece:
- catbird — Paraphrase generation Toolbox and Benchmark
- cc-net — Tools to download and clean Common Crawl
- CEFR-Classifier-French — A French text classification package based on CEFR levels.
- CEFT-Classifier-French — A French text classification package.
- chat-with-mlx — A Retrieval-augmented Generation (RAG) chat interface with support for multiple open-source models, designed to run natively on MacOS and Apple Silicon with MLX.
- chatdesk-grouphug — GroupHug is a library with extensions to 🤗 transformers for multitask language modelling.
- ChatGLM6Bpkg — ChatGLM6Bpkg is a package for ChatGLM-6B (https://github.com/THUDM/ChatGLM-6B/tree/main).
- chatmof — chatmof
- classy-core — A powerful tool to train and use your classification models.
- clipbit — Generate concise meaningful summaries YouTube videos
- cm3 — Description of the cm3 package
- cmtt — A library for processing Code Mixed Text. Still in development!
- code-context — no summary
- codegraph-agent — no summary
- cody-adapter-transformers — A friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models
- cog-hf-template — Cog template for Hugging Face.
- cogdl — An Extensive Research Toolkit for Deep Learning on Graphs
- collie-lm — CoLLiE: Collaborative Training of Large Language Models in an Efficient Way
- colossalai-nightly — An integrated large-scale model training system with efficient parallelization techniques
- compassionai-garland — CompassionAI Project Garland - machine translation for classical Tibetan
- compassionai-manas — CompassionAI Project Manas - a bidirectional Tibetan transformer
- competitions — Hugging Face Competitions
- composer — Composer is a PyTorch library that enables you to train neural networks faster, at lower cost, and to higher accuracy.
- compromise-marian — Marian model but with two decoders
- confirms — Comprehension of trade term sheets and confirmations
- continuous-eval — Open-Source Evaluation for GenAI Application Pipelines.
- contract-reviewer — Using NLP to tag contracts across 12 different fields
- convince — Better instruction following for large language models
- cpkil — CPR Python Package
- cream-python — Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models
- crfm-helm — Benchmark for language models
- daaja — NLP data augmentation tool
- DadmaTools — DadmaTools is a Persian NLP toolkit
- danoliterate — Benchmark of Generative Large Language Models in Danish
- DashAI — DashAI: a graphical toolbox for training, evaluating and deploying state-of-the-art AI models.
- data-modori — LMOps Tool for Korean
- dataquality — no summary
- datasets — HuggingFace community-driven open-source library of datasets
- datawords — A library to work with text data
- dbgpt — DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.
- dbgpt-hub — DB-GPT-Hub: Text-to-SQL parsing with LLMs
- DeBERTa — Decoding enhanced BERT with Disentangled Attention
- deep-nlp — Deep nlp library
- deep-training — an easy training architecture
- deepfloyd-if — DeepFloyd-IF (Imagen Free)
- deepgnn-torch — DeepGNN algorithms for pytorch.
- deephub — no summary
- deepoffense — Multilingual Offensive Language Identification with Transformers
- deepprot — machine learning for protein engineering
- delphai-ml-utils — A Python package to manage delphai machine learning operations.
- delta-nlp — DELTA is a deep learning based natural language and speech processing platform.
- dendron — A library for working with LLMs and behavior trees.
- detoxify — A python library for detecting toxic comments
- dgenerate — Batch image generation and manipulation tool supporting Stable Diffusion and related techniques / algorithms, with support for video and animated image processing.
- diffusers — State-of-the-art diffusion in PyTorch and JAX.
- diffusers-unchained — Diffusers
- diffusersv — State-of-the-art diffusion in PyTorch and JAX.
- Djaizz — Artificial Intelligence (AI) in Django Applications
- django-mathtext — Natural Language Understanding (text processing) for math symbols, digits, and words with a Gradio user interface and REST API.
- dl-translate — A deep learning-based translation library built on Huggingface transformers
- docile-benchmark — Tools to work with the DocILE dataset and benchmark
- docketanalyzer — no summary
- docquery — DocQuery: An easy way to extract information from documents
- docquery-test — DocQuery: An easy way to extract information from documents
- document-tools — 🔧 Tools to automate your document understanding tasks.
- docusense — A tool to extract logic from document
- donut-python — OCR-free Document Understanding Transformer
- dotagent — no summary
- dotagent-dev — no summary
- dotams — no summary
- dotnext — no summary
- dpu-utils — Python utilities used by Deep Procedural Intelligence
- drop-backend — API and Command line tools for building drop
- e-models — Tools for helping build of extraction models with scrapy spiders.
- e2eAIOK-denas — Intel® End-to-End AI Optimization Kit
- eagle-llm — Accelerating LLMs by 3x with No Quality Loss
- easyasr — PAI EasyASR Toolkit
- easydistill — PAI EasyDistill Toolkit
- easyjailbreak — Easy Jailbreak toolkit
- easylaser — An easy to use interface to LASER
- easyrl — PAI EasyRL Toolkit
- easytransfer — PAI EasyTransfer Toolkit
- eduardo-gces-poetry — no summary
- eir-dl — no summary
- embedding-as-service — embedding-as-service: one-stop solution to encode sentence to vectors using various embedding methods
- embedding4bert — A package for extracting word representations from BERT/XLNet
- engawa — no summary
- epochraft — Supercharge Your LLM Training with Checkpointable Data Loading
- ersatz — Simple sentence segmentation toolkit for segmenting and scoring
- escape-unk — Escape unknown symbols in SentecePiece vocabularies
- espnet — ESPnet: end-to-end speech processing toolkit
- espnet-onnx — ONNX Wrapper for ESPnet
- evadb — EvaDB AI-Relational Database System
- evaluate — HuggingFace community-driven open-source library of evaluation
- exciton — Natural Language Processing by the Exciton Research
- exl2conv — no summary
- exllama — no summary
- exllamav2 — no summary
- ExpoSeq — A pacakge which provides various ways to analyze NGS data from phage display campaigns
- facilyst — Make data analysis and machine learning tools more easily accessible.