Embedding Models

HOT

Projects and tools related to embedding techniques in AI and ML.

Active projects 100
New this week +411
Total star growth +580
Cross-source 1
473.6k
Total Stars
59.0k
Total Forks
1
Multi-Source Repos
+580
Stars This Period

Top Projects (100)

SU

supabase/supabase

The Postgres development platform. Supabase gives you a dedicated Postgres database to build your web, mobile, and AI applications.

Trend 20
ai alternative auth database deno embeddings example firebase nextjs oauth2 pgvector postgis postgres postgresql postgrest realtime supabase vectors websockets
100.5k 12.0k +60/wk
GitHub PyPI 2-source
SA

nubskr/satoriDB

High performance embedded vector database

Trend 4
🔥 Heating Up +11.0%
ai database document-retrieval embeddings fuzzy-search llm nearest-neighbor-search rag rust vector-database vector-search vectors
232 17 +2/wk
GitHub
CL

yoloshii/ClawMem

On-device context engine and memory for AI agents. Claude Code, Hermes and OpenClaw. Hooks + MCP server + hybrid RAG search.

Trend 4
ai-agent-memory ai-agents bun claude-code context-engine embeddings hermes-agent hybrid-search llama-cpp local-first mcp-server mcp-tools model-context-protocol on-device-ai openclaw rag retrieval-augmented-generation sqlite typescript vector-search
86 11 +3/wk
GitHub
HO

plastic-labs/honcho

Memory library for building stateful agents

Trend 4
agent-memory ai ai-agents ai-memory anthropic context-engineering continual-learning embeddings fastapi langchain llm long-term-memory memory openai personalization python rag state-management typescript vector-database
1.8k 226 +66/wk
GitHub
VE

lemon07r/Vera

Local code search combining BM25, vector similarity, and cross-encoder reranking. Parses 60+ languages with tree-sitter, runs entirely offline, and returns structured results with file paths, line ranges, and symbol metadata. Built in Rust.

Trend 4
bm25 cli code-search code-search-engine cross-encoder embeddings local mcp mcp-server onnx rag reranking retrieval rust semantic-search semantic-search-engine skills tree-sitter vector-search
59 6 +2/wk
GitHub
BL

datawhalechina/base-llm

从 NLP 到 LLM 的算法全栈教程,在线阅读地址:https://datawhalechina.github.io/base-llm/

Trend 3
bert deeplearning docker fine-tuning linux llama llm lora nlp python pytorch qwen rnn tensorrt transformer tutorial
587 55 +23/wk
GitHub
OC

Muvon/octocode

Semantic code searcher and codebase utility

Trend 3
ai ai-tools cli cli-app code-search developer-tool developer-tools doc-search embeddings graphrag knowledge-graph lancedb mcp mcp-server mcp-servers rag rust semantic-search semantic-search-ai tree-sitter
310 32 +2/wk
GitHub
ME

zilliztech/memsearch

A Markdown-first memory system, a standalone library for any AI agent. Inspired by OpenClaw.

Trend 3
agent agent-memory ai-agents claude-code claude-code-plugin clawdbot embeddings harness hybrid-search long-term-memory memory milvus openclaw opencode progressive-disclosure rag reranker semantic-search skills
1.1k 108 +4/wk
GitHub
OS

your-papa/obsidian-Smart2Brain

An Obsidian plugin to interact with your privacy focused AI-Assistant making your second brain even smarter!

Trend 3
ai chatgpt embeddings obsidian-md obsidian-plugin ollama rag
1.0k 80 +20/wk
GitHub
IN

sbhjt-gr/InferrLM

On-device AI for iOS & Android

Trend 3
anthropic document-processing edge-ai embeddings gemini gguf http-server llama-cpp llamacpp local-inference local-llm multimodal-ai on-device-ai openai rag
81 19 +0/wk
GitHub
CM

thedotmack/claude-mem

A Claude Code plugin that automatically captures everything Claude does during your coding sessions, compresses it with AI (using Claude's agent-sdk), and injects relevant context back into future sessions.

Trend 3
ai ai-agents ai-memory anthropic artificial-intelligence chromadb claude claude-agent-sdk claude-agents claude-code claude-code-plugin claude-skills embeddings long-term-memory mem0 memory-engine openmemory rag sqlite supermemory
46.3k 3.6k +226/wk
GitHub
IN

InsForge/InsForge

Give agents everything they need to ship fullstack apps. The backend built for agentic development.

Trend 3
ai ai-agents coding deno embeddings insforge nextjs oauth2 pgvector postgresql realtime vectors websockets
7.4k 582 +39/wk
GitHub
SO

giancarloerra/SocratiCode

Enterprise-grade (40m+ lines) codebase intelligence in a zero-setup, private and local Claude Plugin or MCP: managed indexing, hybrid semantic search, polyglot code dependency graphs, and DB/API/infra knowledge. Benchmark: 61% less tokens, 84% fewer calls, 37x faster than standard AI grep.

Trend 3
ai ai-assistant ast claude code-graph codebase-analysis codebase-intelligence docker embeddings gemini mcp ollama openai qdrant semantic semantic-search vector-database vector-embeddings vector-search vectorization
784 106 +4/wk
GitHub
VE

vectordbz/vectordbz

A modern desktop application for exploring, managing, and analyzing vector databases

Trend 3
ai aitools chroma database-management embeddings milvus pgvector pinecone qdrant rag semantic-search vector-data-management vector-database vector-database-embedding vector-search vector-store vectordatabase vectordb vectorspace weaviate
205 13 +1/wk
GitHub
GN

gmickel/gno

Local AI-powered document search and editing with first-in-class hybrid retrieval, LLM answers, WebUI, REST API and MCP support for AI clients.

Trend 3
ai-assistant bun cli code-search document-search embeddings knowledge-base llm local-first mcp offline pkm rag second-brain semantic-search typescript vector-search
65 7 -1/wk
GitHub
LA

langchain4j/langchain4j

LangChain4j is an open-source Java library that simplifies the integration of LLMs into Java applications through a unified API, providing access to popular LLMs and vector databases. It makes implementing RAG, tool calling (including support for MCP), and agents easy. LangChain4j integrates seamlessly with various enterprise Java frameworks.

Trend 3
anthropic chatgpt chroma embeddings gemini gpt huggingface java langchain llama llm llms milvus ollama onnx openai openai-api pgvector pinecone vector-database
11.5k 2.1k +22/wk
GitHub
OP

CaviraOSS/OpenMemory

Local persistent memory store for LLM applications including claude desktop, github copilot, codex, antigravity, etc.

Trend 3
ai ai-agents ai-infrastructure ai-memory artificial-intelligence cognitive-architecture embeddings gemini llm long-term-memory memory memory-engine memory-retrieval ollama one-line openai openmemory rag supermemory vector-database
3.9k 442 +12/wk
GitHub
GR

yoanbernabeu/grepai

Semantic Search & Call Graphs for AI Agents (100% Local)

Trend 3
ai claude-code cli code-search cursor developer-tools embeddings golang mcp privacy-first semantic-search vector-search
1.6k 127 +6/wk
GitHub
OS

Ryandonofrio3/osgrep

Open Source Semantic Search for your AI Agent

Trend 3
colbert embeddings grep grep-search
1.1k 69 +1/wk
GitHub
FR

Anush008/fastembed-rs

Rust library for vector embeddings and reranking.

Trend 3
embeddings fastembed rag reranker reranking retrieval retrieval-augmented-generation vector-search
841 117 +1/wk
GitHub
ME

Blaizzy/mlx-embeddings

MLX-Embeddings is the best package for running Vision and Language Embedding models locally on your Mac using MLX.

Trend 3
chatbot embeddings llms rag retrieval-augmented-generation
346 38 +0/wk
GitHub
ED

babycommando/entity-db

EntityDB is an in-browser vector database wrapping indexedDB and Transformers.js over WebAssembly

Trend 3
db dbbrowser embeddings idb indexed-db indexeddb indexeddb-wrapper transformers vector-database wasm webassembly
287 26 +1/wk
GitHub
LL

amscotti/local-LLM-with-RAG

Running local Language Language Models (LLM) to perform Retrieval-Augmented Generation (RAG)

Trend 3
agentic agentic-ai agentic-rag chatbot embeddings langchain llm mistral ollama pydantic-ai python rag retrieval-augmented-generation streamlit
275 54 +1/wk
GitHub
AT

TimeSurgeLabs/athenadb

🦉⚡️Serverless, distributed vector database as an API

Trend 3
cloudflare cloudflare-ai cloudflare-d1 cloudflare-vectorize cloudflare-workers cloudflare-workers-ai embeddings vector-database
272 8 +1/wk
GitHub
OH

huggingface/optimum-habana

Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)

Trend 3
bert fine-tuning habana hpu transformers
209 271 +0/wk
GitHub
BE

illiterate/BertClassifier

基于PyTorch的BERT中文文本分类模型(BERT Chinese text classification model implemented by PyTorch)

Trend 3
bert pytorch transformer transformers
206 25 +0/wk
GitHub
CG

amikos-tech/chroma-go

The Go client for Chroma vector database

Trend 3
chromadb client embeddings vector-database
202 35 +0/wk
GitHub
OL

tryAGI/Ollama

Ollama SDK for .NET

Trend 3
ai autosdk chat-completion csharp dotnet embeddings llm local ollama openapi sdk
191 15 +0/wk
GitHub
MR

MinishLab/model2vec-rs

Official Rust Implementation of Model2Vec

Trend 3
embeddings model2vec nlp rust word-embeddings
167 17 +0/wk
GitHub
BC

THUNLP-AIPoet/BERT-CCPoem

BERT-CCPoem is an BERT-based pre-trained model particularly for Chinese classical poetry

Trend 3
bert poetry pretrain
162 19 +1/wk
GitHub
SE

jkrukowski/swift-embeddings

Run embedding models locally in Swift using MLTensor.

Trend 3
coreml embeddings mltensor swift
144 17 +0/wk
GitHub
BV

Voine/Bert-VITS2-MNN

TTS System Bert-VITS2 Android Ver, powered by alibaba-MNN engine.

Trend 3
android android-app bert bert-vits2 cppjieba mnn tokenizer tts tts-android tts-engines vits
133 15 +0/wk
GitHub
MF

marqo-ai/marqo-FashionCLIP

State-of-the-art CLIP/SigLIP embedding models finetuned for the fashion domain. +57% increase in evaluation metrics vs FashionCLIP 2.0.

Trend 3
clip embeddings fashion-classifier fashionclip informationretrieval multimodal recomendations search transformers vectorsearch vision-transformer
127 14 +0/wk
GitHub
LT

datawhalechina/leedl-tutorial

《李宏毅深度学习教程》(李宏毅老师推荐👍,苹果书🍎),PDF下载地址:https://github.com/datawhalechina/leedl-tutorial/releases

Trend 3
bert chatgpt cnn deep-learning diffusion gan leedl-tutorial machine-learning network-compression pruning reinforcement-learning rnn self-attention transfer-learning transformer tutorial
16.5k 3.1k -3/wk
GitHub
UN

Unstructured-IO/unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

Trend 3
data-pipelines deep-learning document-image-analysis document-image-processing document-parser document-parsing docx donut information-retrieval langchain llm machine-learning ml natural-language-processing nlp ocr pdf pdf-to-json pdf-to-text preprocessing
14.4k 1.2k +7/wk
GitHub
WE

Tencent/WeKnora

LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.

Trend 3
agent agentic ai chatbot chatbots embeddings evaluation generative-ai golang knowledge-base llm multi-tenant multimodel ollama openai question-answering rag reranking semantic-search vector-search
13.8k 1.6k +12/wk
GitHub
PA

PaddlePaddle/PaddleNLP

Easy-to-use and powerful LLM and SLM library with awesome model zoo.

Trend 3
bert compression distributed-training document-intelligence embedding ernie information-extraction llama llm neural-search nlp paddlenlp pretrained-models question-answering search-engine semantic-analysis sentiment-analysis transformers uie
12.9k 3.1k +0/wk
GitHub
CA

jina-ai/clip-as-service

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

Trend 3
bert bert-as-service clip-as-service clip-model cross-modal-retrieval cross-modality deep-learning image2vec multi-modality neural-search onnx openai pytorch sentence-encoding sentence2vec
12.8k 2.1k +2/wk
GitHub
TX

neuml/txtai

💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

Trend 3
agents ai ai-agents embeddings information-retrieval language-model large-language-models llm nlp python rag retrieval-augmented-generation search search-engine semantic-search sentence-embeddings transformers txtai vector-database vector-search
12.4k 800 +2/wk
GitHub
TT

NielsRogge/Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

Trend 3
bert gpt-2 layoutlm pytorch transformers vision-transformer
11.6k 1.7k +0/wk
GitHub
FL

FlagOpen/FlagEmbedding

Retrieval and Retrieval-augmented LLMs

Trend 3
embeddings information-retrieval llm retrieval-augmented-generation sentence-embeddings text-semantic-similarity
11.5k 851 +5/wk
GitHub
TO

huggingface/tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Trend 3
bert gpt language-model natural-language-processing natural-language-understanding nlp transformers
10.6k 1.1k +3/wk
GitHub
CB

ymcui/Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)

Trend 3
bert bert-wwm bert-wwm-ext chinese-bert nlp pytorch rbt roberta roberta-wwm tensorflow
10.2k 1.4k +2/wk
GitHub
NC

brightmart/nlp_chinese_corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

Trend 3
bert chinese chinese-corpus chinese-dataset chinese-nlp corpus dataset language-model news nlp pretrain question-answering text-classification wiki word2vec
9.9k 1.6k +2/wk
GitHub
SE

apache/seatunnel

SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.

Trend 3
apache batch cdc change-data-capture data-ingestion data-integration elt embeddings high-performance llm multimodal offline real-time streaming
9.2k 2.2k -1/wk
GitHub
BE

jessevig/bertviz

BertViz: Visualize Attention in Transformer Models

Trend 3
bert gpt2 machine-learning natural-language-processing neural-network nlp pytorch roberta transformer transformers visualization
8.0k 873 +2/wk
GitHub
BE

MaartenGr/BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

Trend 3
bert ldavis machine-learning nlp sentence-embeddings topic topic-modeling topic-modelling topic-models transformers
7.5k 887 +4/wk
GitHub
PO

postgresml/postgresml

Postgres with GPUs for ML/AI apps.

Trend 3
ai ann approximate-nearest-neighbor-search artificial-intelligence classification clustering embeddings forecasting knn llm machine-learning ml postgres rag regression sql vector-database
6.7k 361 +0/wk
GitHub
BP

codertimo/BERT-pytorch

Google AI 2018 BERT pytorch implementation

Trend 3
bert language-model nlp pytorch transformer
6.5k 1.3k +0/wk
GitHub
FA

NVIDIA/FasterTransformer

Transformer related optimization, including BERT, GPT

Trend 3
bert gpt pytorch transformer
6.4k 934 +1/wk
GitHub
LA

lance-format/lance

Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

Trend 3
apache-arrow computer-vision data-analysis data-analytics data-centric data-format data-science dataops deep-learning duckdb embeddings llms machine-learning mlops python rust
6.3k 620 +3/wk
GitHub
AP

lonePatient/awesome-pretrained-chinese-nlp-models

Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合

Trend 3
bert chinese dataset ernie gpt gpt-2 large-language-models llm multimodel nezha nlp nlu-nlg pangu pretrained-models roberta simbert xlnet
5.5k 510 +1/wk
GitHub
DA

Eventual-Inc/Daft

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

Trend 3
ai-engineering ai-pipeline arrow artificial-intelligence big-data data-engineering distributed distributed-computing distributed-systems embeddings etl huggingface iceberg machine-learning multimodal parquet python ray rust
5.4k 439 +4/wk
GitHub
MI

volcengine/MineContext

MineContext is your proactive context-aware AI partner(Context-Engineering+ChatGPT Pulse)

Trend 3
agent context-engineering electron embedding-models javascript memory proactive-ai python python3 rag react typescript vector-database vision-language-model
5.2k 387 +0/wk
GitHub
TE

shibing624/text2vec

text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。

Trend 3
embeddings nlp sentence-embeddings similarity text-similarity text2vec word2vec
5.0k 425 +0/wk
GitHub
AU

Marker-Inc-Korea/AutoRAG

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

Trend 3
analysis automl benchmarking document-parser embeddings evaluation llm llm-evaluation llm-ops open-source ops optimization pipeline python qa rag rag-evaluation retrieval-augmented-generation
4.7k 389 -1/wk
GitHub
TE

huggingface/text-embeddings-inference

A blazing fast inference solution for text embeddings models

Trend 3
ai embeddings huggingface llm ml
4.7k 378 +5/wk
GitHub
CL

CLUEbenchmark/CLUE

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

Trend 3
albert benchmark bert chinese chineseglue corpus dataset glue language-model nlu pretrained-models pytorch roberta tensorflow transformers
4.2k 545 +1/wk
GitHub
KE

MaartenGr/KeyBERT

Minimal keyword extraction with BERT

Trend 3
bert keyphrase-extraction keyword-extraction mmr
4.1k 379 +1/wk
GitHub
SN

JohnSnowLabs/spark-nlp

State of the Art Natural Language Processing

Trend 3
bert entity-extraction language-detection lemmatizer llamacpp llm machine-translation named-entity-recognition natural-language-processing nlp onnx part-of-speech-tagger pyspark question-answering sentiment-analysis spark spell-checker tensorflow text-classification transformers
4.1k 741 +0/wk
GitHub
RL

crmne/ruby_llm

One beautiful Ruby API for OpenAI, Anthropic, Gemini, Bedrock, Azure, OpenRouter, DeepSeek, Ollama, VertexAI, Perplexity, Mistral, xAI, GPUStack & OpenAI compatible APIs. Agents, Chat, Vision, Audio, PDF, Images, Embeddings, Tools, Streaming & Rails integration.

Trend 3
agents ai anthropic chatgpt claude deepseek embeddings gemini gpustack image-generation llm mistral ollama openai openrouter perplexity rails ruby vertex-ai xai
3.8k 416 +3/wk
GitHub
LI

lightly-ai/lightly

A python library for self-supervised learning on images.

Trend 3
computer-vision contrastive-learning contributions-welcome deep-learning embeddings hacktoberfest machine-learning pytorch self-supervised-learning
3.7k 325 +1/wk
GitHub
AT

ben1234560/AiLearning-Theory-Applying

快速上手AI理论及应用实战:基础知识、Transformer、NLP、ML、DL、竞赛。含大量注释及数据集,力求每一位能看懂并复现。

Trend 3
ai bert dataming deep-learning kaggle-competition learning-by-doing machine-learning nlp
3.5k 479 +1/wk
GitHub
LN

datawhalechina/learn-nlp-with-transformers

we want to create a repo to illustrate usage of transformers in chinese

Trend 3
bert nlp transformer
3.2k 503 +4/wk
GitHub
UP

dbiir/UER-py

Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo

Trend 3
albert bart bert chinese classification clue elmo fine-tuning gpt gpt-2 model-zoo natural-language-processing ner pegasus pre-training pytorch roberta t5 unilm xlm-roberta
3.1k 521 +1/wk
GitHub
FA

qdrant/fastembed

Fast, Accurate, Lightweight Python library to make State of the Art Embedding

Trend 3
embeddings openai rag retrieval retrieval-augmented-generation vector-search
2.8k 195 +1/wk
GitHub
AD

adapter-hub/adapters

A Unified Library for Parameter-Efficient and Modular Transfer Learning

Trend 3
adapters bert lora natural-language-processing nlp parameter-efficient-learning parameter-efficient-tuning pytorch transformers
2.8k 373 +1/wk
GitHub
DA

datachain-ai/datachain

Analytics, Versioning and ETL for multimodal data: video, audio, PDFs, images

Trend 3
ai cv data-analytics data-wrangling embeddings llm llm-eval machine-learning mlops multimodal
2.7k 140 +1/wk
GitHub
BO

milvus-io/bootcamp

Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc.

Trend 3
audio-search deep-learning embeddings image-classification image-recognition image-search llm milvus nlp python question-answering rag semantic-search unstructured-data vector-database
2.4k 683 +1/wk
GitHub
VA

Mintplex-Labs/vector-admin

The universal tool suite for vector database management. Manage Pinecone, Chroma, Qdrant, Weaviate and more vector databases with ease.

Trend 3
ai ai-agents aitools chroma database-management document-retrieval embeddings flowise langchain langchain-js llms pinecone qdrant vector-data-management vector-database vector-database-embedding vector-search vectordatabase vectorspace weaviate
2.2k 357 +0/wk
GitHub
TE

hila-chefer/Transformer-Explainability

[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.

Trend 3
attention-matrix attention-visualization bert bert-model cvpr2021 deep-learning explainability perturbation transformer-interpretability vision-transformer visualize-classifications vit
2.0k 259 +1/wk
GitHub
AG

agentset-ai/agentset

The open-source RAG platform: built-in citations, deep research, 22+ file formats, partitions, MCP server, and more.

Trend 3
agentic-rag ai ai-agents ai-sdk chatbots embeddings genai llms memory memory-management rag vercel-ai-sdk
1.9k 171 +2/wk
GitHub
SC

allenai/scibert

A BERT model for scientific text.

Trend 3
bert nlp scientific-papers
1.7k 232 +0/wk
GitHub
JI

nyu-mll/jiant

jiant is an nlp toolkit

Trend 3
bert multitask-learning nlp sentence-representation transfer-learning transformers
1.7k 297 +0/wk
GitHub
MO

AnswerDotAI/ModernBERT

Bringing BERT into modernity via both architecture changes and scaling

Trend 3
bert embeddings llm nlp
1.7k 145 +0/wk
GitHub
EN

jasonwei20/eda_nlp

Data augmentation for NLP, presented at EMNLP 2019

Trend 3
classification cnn data-augmentation embeddings nlp position rnn sentence swap synonyms text-classification
1.7k 313 +0/wk
GitHub
LL

LLPhant/LLPhant

LLPhant - A comprehensive PHP Generative AI Framework using OpenAI GPT 4. Inspired by Langchain

Trend 3
agent autophp embeddings genai generative-ai gpt4 langchain laravel llamaindex openai php symfony vector-database
1.5k 149 +1/wk
GitHub
AB

aws-samples/amazon-bedrock-samples

This repository contains examples for customers to get started using the Amazon Bedrock Service. This contains examples for all available foundational models

Trend 3
amazon-bedrock amazon-titan bedrock embeddings generative-ai knowledge-base langchain rag
1.4k 671 +1/wk
GitHub
LT

lightly-ai/lightly-train

All-in-one training for vision models (YOLO, ViTs, RT-DETR, DINOv3): pretraining, fine-tuning, distillation.

Trend 3
computer-vision contrastive-learning deep-learning dinov2 dinov3 distillation embeddings eomt machine-learning object-detection pretrained-models python pytorch real-time rtdetrv2 self-supervised self-supervised-learning semantic-segmentation vision-transformer yolo
1.4k 69 +2/wk
GitHub
NA

natasha/natasha

Solves basic Russian NLP tasks, API for lower level Natasha projects

Trend 3
embeddings morphology ner nlp python russian sentence-segmentation syntax tokenizer visualization
1.3k 114 +1/wk
GitHub
TR

NVIDIA-Merlin/Transformers4Rec

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.

Trend 3
bert gtp huggingface language-model nlp pytorch recommender-system recsys seq2seq session-based-recommendation tabular-data transformer xlnet
1.3k 159 +2/wk
GitHub
DE

unitaryai/detoxify

Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at [email protected].

Trend 3
bert bert-model hate-speech hate-speech-detection hatespeech huggingface huggingface-transformers kaggle-competition nlp pytorch-lightning sentence-classification toxic-comment-classification toxic-comments toxicity toxicity-classification
1.2k 143 +1/wk
GitHub
TF

Denis2054/Transformers-for-NLP-2nd-Edition

Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning, training, and prompt engineering examples. A bonus section with ChatGPT, GPT-3.5-turbo, GPT-4, and DALL-E including jump starting GPT-4, speech-to-text, text-to-speech, text to image generation with DALL-E, Google Cloud AI,HuggingGPT, and more

Trend 3
bert chatgpt chatgpt-api dall-e dall-e-api deep-learning gpt-3-5-turbo gpt-4 gpt-4-api huggingface-transformers machine-learning natural-language-processing nlp openai python pytorch roberta-model transformers trax
960 357 +0/wk
GitHub
CG

philippgille/chromem-go

Embeddable vector database for Go with Chroma-like interface and zero third-party dependencies. In-memory with optional persistence.

Trend 3
chroma chromadb cosine-similarity embedded embeddings go golang in-memory llm llms nearest-neighbor rag retrieval-augmented-generation vector-database vector-search
910 65 +0/wk
GitHub
TT

abhimishra91/transformers-tutorials

Github repo with tutorials to fine tune transformers for diff NLP tasks

Trend 3
bert classification deep-learning distilbert named-entity-recognition natural-language-processing nlp pytorch pytorch-tutorial t5 transformers wandb
862 196 +0/wk
GitHub
VE

Davidyz/VectorCode

A code repository indexing tool to supercharge your LLM experience.

Trend 3
embeddings mcp mcp-server neovim-plugin rag retrieval-augmented
844 46 +1/wk
GitHub
LI

henomis/lingoose

🪿 LinGoose is a Go framework for building awesome AI/LLM applications.

Trend 3
ai chatgpt embeddings go golang index llm openai pinecone pipeline prompt vector
828 75 +0/wk
GitHub
RA

danny-avila/rag_api

ID-based RAG FastAPI: Integration with Langchain and PostgreSQL/pgvector

Trend 3
api api-rest embeddings fastapi langchain pgvector postgresql psql python rag vector vector-database
786 348 +1/wk
GitHub
LA

ThomasVitale/llm-apps-java-spring-ai

Samples showing how to build Java applications powered by Generative AI and LLMs using Spring AI and Spring Boot.

Trend 3
embeddings generative-ai large-language-models llm ollama openai rag spring-ai
739 167 +0/wk
GitHub
LP

ghimiresunil/LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing

LLM-PowerHouse: Unleash LLMs' potential through curated tutorials, best practices, and ready-to-use code for custom training and inferencing.

Trend 3
bert huggingface large-language-models llm-inference llm-training llm-tutorials open-source open-source-llm transformers
728 121 +0/wk
GitHub
AR

aub-mind/arabert

Pre-trained Transformers for Arabic Language Understanding and Generation (Arabic BERT, Arabic GPT2, Arabic ELECTRA)

Trend 3
arabert arabic arabic-classification arabic-nlp bert electra farasa gpt2 huggingface-transformer
717 146 +0/wk
GitHub
AL

onejune2018/Awesome-LLM-Eval

Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.

Trend 3
awsome-list awsome-lists benchmark bert chatglm chatgpt dataset evaluation gpt3 large-language-model leaderboard llama llm llm-evaluation machine-learning nlp openai qwen rag
630 54 +1/wk
GitHub
RD

Denis2054/RAG-Driven-Generative-AI

This repository provides programs to build Retrieval Augmented Generation (RAG) code for Generative AI with LlamaIndex, Deep Lake, and Pinecone leveraging the power of OpenAI and Hugging Face models for generation and evaluation.

Trend 3
advanced-rag chroma chromadb embedding-models fine-tuning gpt-4o-mini gpt4-omni grok huggingface indexing-querying llama llama-index multimodal openai-api pinecone rag scaling vision-transformer xai-grok
596 202 +1/wk
GitHub
PB

taishan1994/pytorch_bert_bilstm_crf_ner

基于pytorch的bert_bilstm_crf中文命名实体识别

Trend 3
bert crf named-entity-recognition ner pytorch
595 84 +0/wk
GitHub
TB

stefan-it/turkish-bert

Turkish BERT/DistilBERT, ELECTRA, ConvBERT and T5 models

Trend 3
bert convbert distilbert electra t5 turkish
569 49 +1/wk
GitHub
AC

codelion/adaptive-classifier

A flexible, adaptive classification system for dynamic text classification

Trend 3
adaptive-learning adaptive-neural-network bert classifier continous-learning distilbert elastic-weight-consolidation embeddings faiss large-language-models llms machine-learning multi-class-classification multi-label-classification neural-layers neural-networks online-learning roberta text-classification transformers
545 39 +0/wk
GitHub
PY

shibing624/pytextclassifier

pytextclassifier is a toolkit for text classification. 文本分类,LR,Xgboost,TextCNN,FastText,TextRNN,BERT等分类模型实现,开箱即用。

Trend 3
bert classification focalloss-pytorch hierarchical machine-learning nlp pytextclassifier python pytorch softmax text-classification text-classifier
521 77 +0/wk
GitHub
VE

samvallad33/vestige

Cognitive memory for AI agents — FSRS-6 spaced repetition, 29 brain modules, 3D dashboard, single 22MB Rust binary. MCP server for Claude, Cursor, VS Code, Xcode, JetBrains.

Trend 3
ai-memory claude cognitive-science cursor embeddings fsrs long-term-memory mcp mcp-server neuroscience onnx rust spaced-repetition sqlite sveltekit threejs vscode webgpu
470 42 +0/wk
GitHub
CT

JackHCC/Chinese-Text-Classification-PyTorch

中文文本分类任务,基于PyTorch实现(TextCNN,TextRNN,FastText,TextRCNN,BiLSTM_Attention, DPCNN, Transformer,Bert,ERNIE),开箱即用!

Trend 3
attention-mechanism bert cnn dpcnn ernie fasttext nlp pytorch rcnn rnn text-classification transformer
404 61 +0/wk
GitHub
AR

coree/awesome-rag

A curated list of retrieval-augmented generation (RAG) in large language models

Trend 3
awesome-list awesome-resources embeddings large-language-models llm rag rag-model retrieval-augmented retrieval-augmented-generation retrieval-systems
376 35 +1/wk
GitHub

Source Breakdown

GitHub
Stars473.6k
Forks59.0k
Repos100
PyPI
Packages1

Related Topics