Reverse Dependencies of tokenizers
The following projects have a declared dependency on tokenizers:
- flex-model — FlexModel - A Framework for Interpretability of Distributed Large Language Models
- fluidml — FluidML is a lightweight framework for developing machine learning pipelines. Focus only on your tasks and not the boilerplate!
- fms-hf-tuning — FMS HF Tuning
- friday-agent — An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.
- fudstop — no summary
- gcgc — GCGC is a preprocessing library for biological sequence model development.
- general-text-classifier — General Text Classification Library
- geniusrise-huggingface — Huggingface bolts for geniusrise
- geniusrise-openai — Openai bolts for geniusrise
- gft — GFT (general fine-tuning) A Little Language for Deepnets: 1-line programs for fine-tuning, inference and more
- gft-cpu — GFT (general fine-tuning) A Little Language for Deepnets: 1-line programs for fine-tuning, inference and more
- gpt-command-line — Command-line interface for ChatGPT, Claude and Bard
- gpt-readme-reader — A utility to extract setup commands from a GitHub repository
- gpt3discord — A Chat GPT Discord bot
- gptfast — Accelerate transformer inference by 6-8.5x. Native to Huggingface and PyTorch.
- h2ogpt — no summary
- halludetector — Hallucination detection package
- hammadml-gpu — Hammad Python ~ Machine Learning
- happytransformer — Happy Transformer makes it easy to fine-tune NLP Transformer models and use them for inference.
- hebspacy — SpaCy pipeline and models for Hebrew text
- hezar — Hezar: The all-in-one AI library for Persian, supporting a wide variety of tasks and modalities!
- hf-doc-builder — Doc building utility
- hf-trim — A tool to reduce the size of Hugging Face models via vocabulary trimming.
- homegrid — A minimal home gridworld environment to test how agents use language hints.
- IBITTokenizer — Tokenizer for Persian texts based on hazm
- if-dsl-gui-ai — For generating and playing IF games
- igfold — no summary
- imat — Interactive Music Analysis Tool (I-MaT)
- indic-punct — Punctuation and inverse text normalization for Indic languages and English
- inflecteur — python inflector for French language : control gender, tense and number
- instruction-ner — Unofficial implementation of InstructionNER
- instructlab — CLI for interacting with InstructLab
- insyt — Innovative Network Security Technologies
- internet-ml — Internet-ML: Allowing ML to connect to the internet
- internet-nlp — Allowing NLPs to connect to the internet
- ipex-llm — Large Language Model Develop Toolkit
- iqradre — no summary
- irisml-tasks-llava — Irisml adapter tasks for LLAVA models
- japre — Custom pretokenizers for Japanese language models
- jarvis-akul2010 — A library built to make it extremely easy to build a simple voice assistant.
- jiant — State-of-the-art Natural Language Processing toolkit for multi-task and transfer learning built on PyTorch.
- jshbtf0302 — State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch
- kanao — Kanao is a project designed to train a GPT (Generative Pre-trained Transformer) model on custom datasets. It provides the capability to train the model using various data sources, including PDFs, Word documents, plain text files, and URLs.
- kbve — ATLAS
- KD-Lib — A PyTorch model compression library containing easy-to-use methods for knowledge distillation, pruning, and quantization
- keywords-en — keywords extract
- kgc — Cold Start Construction of Knowledge Graph.
- kogpt2-transformers — Transformers library for KoGPT2
- lairgpt — A Pytorch-based package by LightOn AI Research allowing to perform inference with PAGnol models.
- langchain-mistralai — An integration package connecting Mistral and LangChain
- langcheck — Simple, Pythonic building blocks to evaluate LLM-based applications
- langml — A Keras-based and TensorFlow-backend language model toolkit.
- langport — A large language model serving platform.
- langs-vall — Paquete de vall-e-x para proyecto de traduccion de lenguajes
- languagemodels — Simple inference for large language models
- langumo — The unified corpus building environment for Language Models.
- latentscope — Quickly embed, project, cluster and explore a dataset.
- leya — A coding assistant to help with repository management and code queries.
- litellm — Library to easily interface with LLM API providers
- litGPT — Hackable implementation of state-of-the-art open-source LLMs
- livestt — Simple and easy to use realtime speech to text
- llama-llm — Build on large language models faster
- llamada — Build on large language models faster
- llava-torch — Towards GPT-4 like large language and visual assistant.
- llm-docstring-generator — Code to generate docstrings for Python code using GPT-4 etc.
- llm-falcon-model — Microlib for the Falcon LLM
- llm2openai — Create a Python package.
- llmlite — A library helps to chat with all kinds of LLMs consistently.
- llmopenai — Create a Python package.
- LLMSmith — Lightweight Python library designed for developing functionalities powered by Large Language Models (LLMs)
- llmware — An enterprise-grade LLM-based development framework, tools, and fine-tuned models
- lm-detect — Zero-Shot Machine-Generated Text Detection
- logai — LogAI is unified framework for AI-based log analytics
- longchat — LongChat and LongEval
- lp-Aicloud — this a aicloud
- manifest-ml — Manifest for Prompting Foundation Models.
- methylbert — A Transformer-based model for read-level DNA methylation pattern identification and tumour deconvolution
- miditok — MIDI / symbolic music tokenizers for Deep Learning models.
- miditok-for-musiclang — A convenient MIDI tokenizer for Deep Learning networks, with multiple encoding strategies
- mindformers — mindformers platform: linux, cpu: x86_64
- mindnlp — An open source natural language processing research tool box. Git version: [sha1]:18acd45, [branch]: (HEAD -> master, ms/master)
- minimagen — Minimal Imagen text-to-image model implementation.
- MiSeCom — Detect if the English has missing sentence components such as Subject, Verb, Object
- mlm-task-for-contextual-embedding — a machine learning project for mlm task for contextual embedding
- mlx-transformers — MLX transformers is a machine learning framework with similar Interface to Huggingface transformers.
- mmda — MMDA - multimodal document analysis
- modelscope — ModelScope: bring the notion of Model-as-a-Service to life.
- molfeat — molfeat - the hub for all your molecular featurizers
- MovieChat — Long video understanding
- mudes — Toxic Spans Prediction
- musiclang-predict — A python package for music notation and generation
- mw-adapter-transformers — A friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models
- mysfire — Fast (and opinionated) data loading for pytorch
- naivenlp-datasets — Data pipelines for TensorFlow and PyTorch.
- name-entity-extraction-for-contextual-embedding — a machine learning project for mlm task for contextual embedding
- needlehaystack — Doing simple retrieval from LLM models at various context lengths to measure accuracy.
- nepalitokenizers — Pre-trained Tokenizers for the Nepali language with an interface to HuggingFace's tokenizers library for customizability.
- neumai — Package containing connectors for Neum AI.
- neureca — A framework for building conversational recommender systems
- neurox — Toolkit for Neuron Analysis in Deep NLP Models