Reverse Dependencies of tokenizers
The following projects have a declared dependency on tokenizers:
- nhelper — 🧪 Behavioral tests for NLP models 🧪
- nlpipes — Text Classification with Transformers
- nnsight — Package for interpreting and manipulating the internals of deep learning models.
- novelai-api — Python API for the NovelAI REST API
- novelai-python — NovelAI Python Binding With Pydantic
- NovelAILLMWrapper — no summary
- npc-engine — Deep learning inference and NLP toolkit for game development.
- nynoflow — NynoFlow
- omdenalore — AI for Good library
- one-api-tool — Use only one line of code to call multiple model APIs similar to ChatGPT. Currently supported: Azure OpenAI Resource endpoint API, OpenAI Official API, and Anthropic Claude series model API.
- open-retrievals — Text Embeddings for Retrieval and RAG based on transformers
- OpenBMB — Create a Python package.
- opencompass — A comprehensive toolkit for large model evaluation
- OpenELM — Evolution Through Large Models
- openicl — An open source framework for in-context learning.
- OpenNIR-XPM — OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline (Experimaestro version)
- openparse — Streamlines the process of preparing documents for LLM's.
- optimum-graphcore — Optimum Library is an extension of the Hugging Face Transformers library, providing a framework to integrate third-party libraries from Hardware Partners and interface with their specific functionality.
- optimum-transformers — Accelerated nlp pipelines using Transformers, Optimum and ONNX Runtime
- os-copilot — An self-improving embodied conversational agents seamlessly integrated into the operating system to automate our daily tasks.
- osc-llm — 大模型训练,推理,部署工具
- own-knowledge-gpt — Custom Knowledge GPT
- pai-easynlp — PAI EasyNLP Toolkit
- PaoDing — An NLP-oriented PyTorch wrapper that makes your life easier.
- papermage — Papermage. Casting magic over scientific PDFs.
- parlai — Unified platform for dialogue research.
- peelml — Peel away the pain of ml deployment
- perceiver-io — Perceiver IO
- petals — Easy way to efficiently run 100B+ language models without high-end GPUs
- pix2tex — pix2tex: Using a ViT to convert images of equations into LaTeX code.
- platform-gen-ai — This is pipeline code for accelerating solution accelerators
- promptbench — PromptBench is a powerful tool designed to scrutinize and analyze the interaction of large language models with various prompts. It provides a convenient infrastructure to simulate **black-box** adversarial **prompt attacks** on the models and evaluate their performances.
- PromptEHR — Sequence patient electronic healthcare record generation with large language models (LLMs) as the neural database.
- promptflow-gui — Create flowcharts to control LLMs
- promptz — A Python package for interactive prompts
- pubmad — Useful tools to work with biology
- punctfix — Punctuation restoration library
- pureml-llm — no summary
- py-vcon-server — server for vCon conversational data container manipulation package
- pydata-wrangler — Wrangle messy data into pandas DataFrames, with a special focus on text data and natural language processing
- pydebiaseddta — Python library to improve generalizability of the drug-target prediction models via DebiasedDTA
- pydta — A Python package for drug-target affinity prediction using biomolecular language processing
- pygaggle — A gaggle of rerankers for CovidQA and CORD-19
- pyllmsearch — LLM Powered Advanced RAG Application
- pyrit — The Python Risk Identification Tool for LLMs (PyRIT) is a library used to assess the robustness of LLMs
- pytorch-clip-interrogator — Prompt engineering tool using BLIP 1/2 + CLIP Interrogate approach.
- qbchemchef — LLM-based tools for information retrieval
- rag4p — This project I use a lot for workshops, it contains some utils for splitters, tokenizers, and a weaviate client that I reuse a lot
- rannet — Recurrent Attention Networks
- rapid-latex-ocr — Tool of converting images of equations into LaTeX code.
- rapidnlp-datasets — Data pipelines for TensorFlow and PyTorch.
- referral-augment — Official implementation of "Referral Augmentation for Zero-Shot Information Retrieval"
- reinforcer — Reinforcement learning
- retri-evals — Open-source tool for building and evaluating retrieval pipelines.
- retvec — Resilient and Efficient Text Vectorizer
- rewardbench — Tools for evaluating reward models
- ReWord — Reorder word in English sentence to follow correct grammar
- robocat — Robo CAT- Pytorch
- rt2 — rt-2 - PyTorch
- ruth-text-to-speech — A Python CLI for Ruth NLP
- ruth-tts-converter — A Python CLI for Ruth NLP
- ruth-tts-converter-python — A Python CLI for Ruth NLP
- rwkv — The RWKV Language Model
- rwkv-beta — The RWKV Language Model
- rwkv-paddle — The RWKV Language Model on PaddlePaddle
- rwkvstic — A package for loading rwkv on a larger range of devices
- safe-mol — Implementation of the 'Gotta be SAFE: a new framework for molecular design' paper
- sagemode — Deploy, scale, and monitor your ML models all with one click. Native to AWS.
- samosila-core — no summary
- scikit-embeddings — Tools for training word and document embeddings in scikit-learn.
- sconce — Model Compresion Made Easy
- seaqube — Semantic Quality Benchmark for Word Embeddings, i.e. Natural Language Models in Python. The shortname is `SeaQuBe` or `seaqube`. Simple call it '| ˈsi: kjuːb |'
- searchdatamodels — no summary
- semantic-search-faiss — Semantic search to query covid related papers
- semantic-text-splitter — Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.
- semiolog — Tools for the semiological analysis of corpora
- sentivi — A simple tool for Vietnamese Sentiment Analysis
- separability — LLM Tools for looking at separability of LLM Capabilities
- sgnlp — Machine learning models from Singapore's NLP research community
- shbtf0302 — State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch
- short-poetry — no summary
- shtec-rlhf — shtec-rlhf: Safe Reinforcement Learning from Human Feedback
- simple-generation — A python package to run inference with HuggingFace checkpoints wrapping many convenient features.
- simple-latex-ocr — A simple LaTeX OCR package
- simpletransformers — An easy-to-use wrapper library for the Transformers library.
- simpletransformers-fork-trialandsuccess — An easy-to-use wrapper library for the Transformers library. FORK: This fork adds T5TokenizerFast and umT5 support.
- simpletransformers-le — An easy-to-use wrapper library for the Transformers library.
- skorch — scikit-learn compatible neural network library for pytorch
- smart-chromadb — Chroma.
- smile-datasets — La**S**t **mile** datasets: Use `tf.data` to solve the last mile data loading problem for tensorflow.
- snowflake-ml-python — The machine learning client library that is used for interacting with Snowflake to build machine learning solutions.
- soco-tokenizer — Fast tokenizer
- sparse_autoencoder — Sparse Autoencoder for Mechanistic Interpretability
- speechless — LLM based agents with proactive interactions, long-term memory, external tool integration, and local deployment capabilities.
- sphinx-summaries — no summary
- SPLADERunner — Ultralight and Fast wrapper for the independent implementation of SPLADE++ models for your search & retrieval pipelines. Models and Library created by Prithivi Da, For PRs and Collaboration to checkout the readme.
- spokestack — Spokestack Library for Python
- stf-test1 — stf
- stonkgs — Sophisticated Transformers for Biomedical Text and Knowledge Graph Data
- stos — Converting the American sign language into speech or text, and vice versa.