Reverse Dependencies of tokenizers
The following projects have a declared dependency on tokenizers:
- streamlit-chromadb-connection — A simple adapter connection for any Streamlit LLM-powered app to use ChromaDB vector database.
- stripedhyena — Model and inference code for beyond Transformer architectures
- ststransformers — An easy-to-use wrapper library for using Transformers in Semantic Textual Similarity Tasks.
- stuned — Utility code from STAI (https://scalabletrustworthyai.github.io/)
- styletts2-fork — Fork of StyleTTS 2 Python packge. StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models. Original authors: Yinghao Aaron Li, Cong Han, Vinay S. Raghavan, Gavin Mischler, Nima Mesgarani, Sidharth Rajaram.
- stylometry-utils — Collection of functions and utilities to run stylometry experiments
- SudachiPy — Python version of Sudachi, the Japanese Morphological Analyzer
- sumformer2 — Summarisation Transformer 2
- suno-bark — Bark text to audio model
- swiftrank — Compact, ultra-fast SoTA reranker enhancing retrieval pipelines and terminal applications.
- syntaxi — Make your tokenizer more syntax-friendly.
- t2v-metrics — Evaluating Text-to-Visual Generation with Image-to-Text Generation.
- taker — Tools for Transformer Activations Knowledge ExtRaction
- test-petals — Easy way to efficiently run 100B+ language models without high-end GPUs
- testgailbot002 — GailBot API
- testgailbotapi — GailBot Test API
- testgailbotapi001 — GailBot Test API
- testpydebiaseddta — Python library to improve generalizability of the drug-target prediction models via DebiasedDTA
- text-embeddings — zero-vocab or low-vocab embeddings
- text-sim — Chinese text similarity calculation package of Tensorflow/Pytorch
- text2tac — text2tac converts text to actions
- textflint — Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing
- textwiz — An even simpler way to use open-source LLMs.
- tf-shb-gabriel-0302 — State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch
- thaixtransformers — ThaiXtransformers: Use Pretraining RoBERTa based Thai language models from VISTEC-depa AI Research Institute of Thailand.
- the-grid — Easy way to efficiently run 100B+ language models without high-end GPUs
- thirdai — A faster cpu machine learning library
- timething — Aligning text transcripts with their audio recordings.
- tinytensor — tinytensor
- TLAF — TLA is built using PyTorch, Transformers and several other State-of-the-Art machine learning techniques and it aims to expedite and structure the cumbersome process of collecting, labeling, and analyzing data from Twitter for a corpus of languages while providing detailed labeled datasets for all the languages.
- tokenizer-adapter — A simple to adapt a pretrained language model to a new vocabulary
- tokenizers — no summary
- tokenizers-gt — no summary
- topicgpt — A package for integrating LLMs like GPT-3.5 and GPT-4 into topic modelling
- topicmodels — A package for topic modelling in python.
- torchblocks — A PyTorch-based toolkit for natural language processing
- torchblocks-chen — A PyTorch-based toolkit for natural language processing
- totokenizers — Text tokenizers.
- trankit — Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing
- transformers — State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
- transformers-cfg — Extension of Transformers library for Context-Free Grammar Constrained Decoding with EBNF grammars
- transformers-domain-adaptation — Adapt Transformer-based language models to new text domains
- transformers-keras — Transformer-based models implemented in tensorflow 2.x(Keras)
- transquest — Transformer based translation quality estimation
- trustplutusmr — Entity Market Research
- turkish-lm-tuner — Implementation of the Turkish LM Tuner
- tweet-se-competition — a machine learning project for kaggle tweet sentiment extraction competition
- uform — Pocket-Sized Multimodal AI for Content Understanding and Generation
- UnicodeTokenizer — UnicodeTokenizer: tokenize all Unicode text
- url-text-module — Text Module of REACT
- vall-e-x — An open source implementation of Microsoft's VALL-E X zero-shot TTS
- vec2text — convert embedding vectors back to text
- vina2vi — no summary
- vlite — A simple and blazing fast vector database
- vllm-haystack — A simple adapter to use vLLM in your Haystack pipelines.
- vltk — The Vision-Language Toolkit (VLTK)
- vtorch — NLP research library, built on PyTorch.
- weak-annotators — Weak annotators for information extraction (NER)
- webull-options — no summary
- whisper-s2t — An Optimized Speech-to-Text Pipeline for the Whisper Model.
- xmnlp — A Lightweight Chinese Natural Language Processing Toolkit
- yolo-world-open — YOLO-World: Real-time Open Vocabulary Object Detection
- ytchat — An open platform for training, serving, and evaluating large language model based chatbots.
- yuezhlib — Library for preprocessing Cantonese and Written Chinese
- zarth-utils — Package used for my personal development on ML projects.
- zeldarose — Train transformer-based models
- zh-rasa — Chinese NLP tool for RASA