Reverse Dependencies of tokenizers
The following projects have a declared dependency on tokenizers:
- adapter-transformers — A friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models
- ai-dataproc — no summary
- AI-ML-Formulas-Recognizer-Extraction — A package for AI recognition tasks developed by Minh Nguyen and Liam.
- ai-python — Microsoft AI Python Package
- ai2-olmo — Open Language Model (OLMo)
- ai21-tokenizer — no summary
- aider-chat — aider is GPT powered coding in your terminal
- aidevkit — 一些ai开发过程中使用到的工具模块
- aihandler — AI Handler: An engine which wraps certain huggingface models
- aihandlerwindows — AI Handler: An engine which wraps certain huggingface models
- akasha-terminal — document QA package using langchain and chromadb
- aleph-alpha-client — python client to interact with Aleph Alpha api endpoints
- algorin-cli — Acceso a GPT-3 y procesamiento de documentos desde la línea de comandos.
- annolid — An annotation and instance segmentation-based multiple animal tracking and behavior analysis package.
- anthropic — The official Python library for the anthropic API
- anthropic-bedrock — The official Python library for the anthropic-bedrock API
- api2openai — Create a Python package.
- archai — Platform for Neural Architecture Search
- arcusapi — Arcus Data Platform Client SDK.
- ares-ai — ARES is an advanced evaluation framework for Retrieval-Augmented Generation (RAG) systems,
- arize — A helper library to interact with Arize AI APIs
- attention-sinks — Extend LLMs to infinite length without sacrificing efficiency and performance, without retraining
- audiossl — no summary
- AudioSummariser — Summarises the text generated from the audio files for quicker resolution. The audio files are typically the customer support recordings for now but the usecase can be extended to more dimensions. Sentiment is analysed and depicted visually.
- auto-learn-gpt — autoML for training and inference Deep Learning model
- autoawq — AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference.
- autogluon-contrib-nlp — MXNet GluonNLP Toolkit (DeepNumpy Version)
- autopeptideml — AutoML system for building trustworthy peptide bioactivity predictors
- awesome-align — An awesome word alignment tool
- awessome — awessome
- bark — Bark text to audio model
- bedrock-anthropic — Client library for the anthropic API with the AWS Bedrock endpoint.
- bert-deid — Remove identifiers from data using BERT
- bert-embeddings — Create positional embeddings based on TinyBERT or similar bert models
- bertnlp — BERT toolkit is a Python package that performs various NLP tasks using Bidirectional Encoder Representations from Transformers (BERT) related models.
- bigdl-llm — Large Language Model Develop Toolkit
- bisheng-pybackend-libs — libraries for bisheng rt pybackend
- blade2blade — Adversarial Training and SFT for Bot Safety Models
- botiverse — botiverse is a chatbot library that offers a high-level API to access a diverse set of chatbot models
- bpeasy — Fast bare-bones BPE for modern tokenizer training
- caikit-nlp — Caikit NLP
- canopy-sdk — Retrieval Augmented Generation (RAG) framework and context engine powered by Pinecone
- cclm — NLP framework for composing together models modularly
- ChatLLM — Create a Python package.
- ChatSQL — Create a Python package.
- chromadb — Chroma.
- chromadb-pysqlite3 — Chroma.
- citoplasm — CITOplasm is a Python library for writing LLM code in a declarative way.
- clarinpl-embeddings — no summary
- cliqs — Module provides implementation of multilingual crisis social media summarization model.
- closeai — Create a Python package.
- codegeex — CodeGeeX: A Open Multilingual Code Generation Model.
- cody-adapter-transformers — A friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models
- cohere — no summary
- complwetion — Small helper library to build chat applications
- compromise-marian — Marian model but with two decoders
- ConsistencyBench — Tools and Techniques for Consistency Benchmarking
- constituent-treelib — A lightweight Python library for constructing, processing, and visualizing constituent trees.
- cpkil — CPR Python Package
- CPM-Bee — Create a Python package.
- cpm-live — Create a Python package.
- crfm-helm — Benchmark for language models
- curated-transformers — A PyTorch library of transformer models and components
- cursivepy — no summary
- DadmaTools — DadmaTools is a Persian NLP toolkit
- dalle-pytorch — DALL-E - Pytorch
- datatrove — HuggingFace library to process and filter large amounts of webdata
- datumaro — Dataset Management Framework (Datumaro)
- dbgpt — DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.
- ddochi — no summary
- deepfloyd-if — DeepFloyd-IF (Imagen Free)
- deepoffense — Multilingual Offensive Language Identification with Transformers
- deepse — **DeepSE**: **Sentence Embeddings** based on Deep Nerual Networks, designed for **PRODUCTION** enviroment!
- dgenerate — Batch image generation and manipulation tool supporting Stable Diffusion and related techniques / algorithms, with support for video and animated image processing.
- dillagent — Agentic LLM library
- dimweb-persona-bot — A dialogue bot with a personality
- disformers — Huggingface transformers for discord.
- distill-trainer — Knowledge distillation toolkit
- dlk — dlk: Deep Learning Kit
- Documents-Classifier — A tool to classify images
- dolma — Data filters
- dooly — A library that handles everything with 🤗 and supports batching to models in PORORO
- easy-transformers — Utils for dealing with transformers
- easyeditor — easyeditor - Editing Large Language Models
- edu-segmentation — To improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset.
- eir-dl — no summary
- emb3d — emb3d.co command line inteface to work with embeddings.
- EMO-AI — library for the ai competition, currently private
- engawa — no summary
- Expanda — Integrated Corpus-Building Environment
- explabox-demo-drugreview — Explabox demo for the UCI drug reviews dataset
- fast-bert-no-plot — AI Library using BERT
- fast-bert-xrendan — AI Library using BERT
- fastembed — Fast, light, accurate library built for retrieval embedding generation
- faster-translate — A simple translation utility using Hugging Face models.
- faster-whisper — Faster Whisper transcription with CTranslate2
- Few-Shot-Learning-NLP — This library provides tools and utilities for Few Shot Learning in Natural Language Processing (NLP).
- finer — no summary
- FlashRank — Ultra lite & Super fast SoTA cross-encoder based re-ranking for your search & retrieval pipelines.
- flex-model — FlexModel - A Framework for Interpretability of Distributed Large Language Models