Multimodal | AISignal

PA

fikrikarim/parlor

On-device, real-time multimodal AI. Have natural voice and vision conversations with an AI that runs entirely on your machine. Powered by Gemma 4 E2B and Kokoro.

Trend 12

⚡ Breakout +87.1%

apple-silicon gemma kokoro litert-lm local-llm mlx multimodal on-device-ai python real-time speech-recognition text-to-speech voice-assistant

1.1k 107 +187/wk

GitHub

AE

x-zheng16/Awesome-Embodied-AI-Safety

Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses | 400+ Papers | Perception, Cognition, Planning, Interaction, Agentic System

Trend 4

🔥 Heating Up +13.5%

adversarial-attacks ai-safety autonomous-driving backdoor-attacks embodied-agents embodied-ai jailbreak large-language-models multimodal robotics survey

59 0 +4/wk

GitHub

GE

lcqysl/GEMS

GEMS: Agent-Native Multimodal Generation with Memory and Skills

Trend 4

🔥 Heating Up +12.5%

agent generation multimodal reasoning

90 4 +2/wk

GitHub

GS

zai-org/GLM-skills

Official skills for the GLM family of models.

Trend 4

glm multimodal ocr skills vision

272 19 +7/wk

GitHub

CL

qingchencloud/clawpanel

🦞 OpenClaw 可视化管理面板 — 内置 AI 助手（工具调用 + 图片识别 + 多模态），一键安装 | Visual management panel with built-in AI assistant (tool calling + vision + multimodal + i18n(11))

Trend 3

admin-panel ai-agent ai-assistant ai-chat ai-tools chatgpt cross-platform deepseek desktop-app image-recognition llm management-panel multimodal openclaw openclaw-panel rust tauri tauri-v2 tool-calling vite

2.2k 279 +28/wk

GitHub

MT

OpenMOSS/MOSS-TTS

MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenarios, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS.

Trend 3

audio audio-tokenizer llm multimodal text-to-speech voice-cloning

1.1k 103 +16/wk

GitHub

VO

vllm-project/vllm-omni

A framework for efficient model inference with omni-modality models

Trend 3

audio-generation diffusion image-generation inference model-serving multimodal pytorch transformer video-generation

4.2k 719 +50/wk

GitHub

AI

datawhalechina/all-in-rag

🔍大模型应用开发实战一：RAG 技术全栈指南，在线阅读地址：https://datawhalechina.github.io/all-in-rag/

Trend 3

ai deepseek embedding kimi-k2 langchain llama-index llm milvus multimodal neo4j python rag

5.9k 2.9k +52/wk

GitHub

UN

yuanzhao-CVLAB/UniMMAD

[CVPR 2026] Official Implementation of UniMMAD: Unified Multi-Modal and Multi-Class Anomaly Detection via MoE-Driven Feature Decompression

Trend 3

anomaly-detection mixture-of-experts multimodal

206 21 +1/wk

GitHub

WA

Anyesh/wardrowbe

Put your wardrobe in rows. Self-hosted AI-powered wardrobe management app.

Trend 3

ai outfit-ai outfit-pairing outfits style-ai wardrobe wardrobe-app wardrobe-management

169 23 +0/wk

GitHub

LO

ParisNeo/lollms

An all in one AI solution compatible with any known AI service on the planet

Trend 3

ai llm multimodal

63 17 +0/wk

GitHub

CV

AccumulateMore/CV

✔（已完结）超级全面的深度学习笔记【土堆 Pytorch】【李沐动手学深度学习】【吴恩达深度学习】【大飞大模型Agent】

Trend 3

agent agents book chinese computer-vision cv deep-learning jupyter-notebook llm llms machine-learning natural-language-processing nlp notebook python rag

19.5k 2.2k +76/wk

GitHub PyPI 2-source

MS

modelscope/ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...) (AAAI 2025).

Trend 3

deepseek-r1 embedding grpo internvl liger llama llama4 llm lora megatron moe multimodal open-r1 peft qwen3 qwen3-5 qwen3-omni qwen3-vl reranker sft

13.6k 1.3k +16/wk

GitHub

MA

Blaizzy/mlx-audio

A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

Trend 3

apple-silicon audio-processing mlx multimodal speech-recognition speech-synthesis speech-to-text text-to-speech transformers

6.6k 541 +8/wk

GitHub

ML

UbiquitousLearning/mllm

Fast Multimodal LLM on Mobile Devices

Trend 3

ai llama llm mobile multimodal

1.5k 187 +0/wk

GitHub

AJ

llm-jp/awesome-japanese-llm

日本語LLMまとめ - Overview of Japanese LLMs

Trend 3

foundation-models generative-ai generative-model generative-models japanese japanese-language japanese-language-model japanese-llm language-model language-models large-language-model large-language-models llm llm-japanese llms multimodal vision-and-language vision-language vision-language-model

1.4k 43 +1/wk

GitHub

PO

InternRobotics/PointLLM

[ECCV 2024 Best Paper Candidate & TPAMI 2025] PointLLM: Empowering Large Language Models to Understand Point Clouds

Trend 3

3d chatbot foundation-models gpt-4 large-language-models llama multimodal objaverse point-cloud pointllm representation-learning vision-and-language

999 57 +1/wk

GitHub

MO

OpenMOSS/MOVA

MOVA: Towards Scalable and Synchronized Video–Audio Generation

Trend 3

diffusion-models multimodal sglang video-audio-generation

887 62 +4/wk

GitHub

PA

PaddlePaddle/PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Trend 3

aigc clip controlnet deepseek-vl dit eva-clip got-ocr20 image-to-text internvl2 llava minicpm-v multimodal ppdiffusers qwen2-vl sd-xl sora stable-diffusion stablevideodiffusion text-to-image text-to-video

721 225 +1/wk

GitHub

NE

EvolvingLMMs-Lab/NEO

NEO Series: Native Vision-Language Models from First Principles

Trend 3

agi encoder-free-vlm large-language-models mllm multimodal multimodal-large-language-models native-multimodal-model vlm

699 25 +1/wk

GitHub

MM

enoche/MMRec

A Toolbox for MultiModal Recommendation. Integrating 10+ Models...

Trend 3

multi-modal-retrieval multimedia-recommendation multimodal recommender-system

649 97 +2/wk

GitHub

VL

TIGER-AI-Lab/VLM2Vec

This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]

Trend 3

benchmark contrastive-learning embedding image-retrieval mmeb multimodal rag representation-learning video-retrieval visual-document-retrieval vlm

622 59 +0/wk

GitHub

OH

shenhao-stu/ohmycaptcha

⚡ Self-hostable YesCaptcha-compatible captcha solver built with FastAPI, Playwright, and OpenAI-compatible multimodal models.

Trend 3

captcha fastapi multimodal openai-compatible playwright recaptcha self-hosted vision-models yescaptcha-compatible

619 210 +1/wk

GitHub

LM

ictnlp/LLaVA-Mini

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

Trend 3

efficient gpt4o gpt4v large-language-models large-multimodal-models llama llava multimodal multimodal-large-language-models video vision vision-language-model visual-instruction-tuning

569 32 +1/wk

GitHub

CS

suzuran0y/CCTV-Smartphone-AI-Monitoring

本地监控 + AI 视觉 — LAN-based smartphone-powered AI monitoring framework with structured event output for data acquisition and analysis.

Trend 3

ai-monitoring computer-vision device-repurposing event-driven image-recognition-tool ip-camera ml-ops monitoring-system multimodal structured-output video-streaming

547 38 +1/wk

GitHub

RA

RobotecAI/rai

RAI is a vendor agnostic agentic framework for Physical AI robotics, utilizing ROS 2 tools to perform complex actions, defined scenarios, free interface execution, log summaries, voice interaction and more.

Trend 3

ai ai-agents-framework embodied-agent embodied-agents embodied-ai embodied-artificial-intelligence generative-ai llm multi-agent-systems multimodal o3de physical-ai robotec robotics ros2 vlm

487 65 +1/wk

GitHub

CL

qingchencloud/clawapp

📱 ClawApp — OpenClaw AI 智能体手机聊天客户端 | 流式对话 · 图片收发 · 工具调用 · PWA + APK | Mobile chat client for OpenClaw AI Agent

Trend 3

ai-agents ai-assistant android capacitor chat-client chinese dark-mode h5 i18n markdown mobile-chat multimodal openclaw pwa self-hosted streaming tool-calling voice-input websocket

375 45 +1/wk

GitHub

AN

antflydb/antfly

Trend 3

ai-agents autoscaling elasticsearch information-retrieval ml multimodal rag semantic-search

324 20 +0/wk

GitHub

OP

clawdotnet/openclaw.net

Self-hosted OpenClaw gateway + agent runtime in .NET (NativeAOT-friendly)

Trend 3

agent-runtime ai-agent automation csharp discord-bot dotnet llm mcp memory microsoft-agent-framework multimodal nativeaot openai-compatible realtime self-evolving self-hosted text-to-speech tool-calling tool-execution

196 31 +1/wk

GitHub

MF

marqo-ai/marqo-FashionCLIP

State-of-the-art CLIP/SigLIP embedding models finetuned for the fashion domain. +57% increase in evaluation metrics vs FashionCLIP 2.0.

Trend 3

clip embeddings fashion-classifier fashionclip informationretrieval multimodal recomendations search transformers vectorsearch vision-transformer

127 14 +0/wk

GitHub

PL

isLinXu/paper-list

autoupdate paper list

Trend 3

action-recognition anomaly-detection audio-processing classification depth-estimation graph-neural-networks image-generation llm multimodal object-detection object-tracking optical-flow pose-estimation reinforcement-learning scene-understanding semantic-segmentation transfer-learning

118 10 +0/wk

GitHub

SM

SmooSenseAI/smoosense

Interactively browse multimodal tabular data

Trend 3

analytics exploratory-data-analysis exploratory-data-visualizations multimodal visualization

108 13 +0/wk

GitHub

LE

oidlabs-com/Lexoid

Multimodal document parser for high quality data understanding and extraction

Trend 3

genai html-to-markdown html-to-pdf large-language-models llms multimodal ocr ocr-python parser-library pdf-document pdf-parser pdf-to-json pdf-to-latex

98 12 +0/wk

GitHub

UT

bytedance/UI-TARS-desktop

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

Trend 3

agent agent-tars browser-use computer-use cowork gui-agent gui-operator mcp mcp-server multimodal tars ui-tars vision vlm

29.3k 2.9k +19/wk

GitHub

LL

haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Trend 3

chatbot chatgpt foundation-models gpt-4 instruction-tuning llama llama-2 llama2 llava multi-modality multimodal vision-language-model visual-language-learning

24.7k 2.8k +7/wk

GitHub PyPI 2-source

UN

microsoft/unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Trend 3

beit beit-3 bitnet deepnet document-ai foundation-models kosmos kosmos-1 layoutlm layoutxlm llm minilm mllm multimodal nlp pre-trained-model textdiffuser trocr unilm xlm-e

22.1k 2.7k +1/wk

GitHub HuggingFace 2-source

SE

jina-ai/serve

☁️ Build multimodal AI applications with cloud-native stack

Trend 3

cloud-native cncf deep-learning docker fastapi framework generative-ai grpc jaeger kubernetes llmops machine-learning microservice mlops multimodal neural-search opentelemetry orchestration pipeline prometheus

21.9k 2.2k -1/wk

GitHub PyPI 2-source

SC

screenpipe/screenpipe

Run agents that work for you based on what you do. AI finally knows what you are doing

Trend 3

agents agi ai computer-vision llm machine-learning ml multimodal vision

18.1k 1.6k +9/wk

GitHub HuggingFace 2-source

VI

pytorch/vision

Datasets, Transforms and Models specific to Computer Vision

Trend 3

computer-vision machine-learning

17.6k 7.2k +0/wk

GitHub

AP

bharathgs/Awesome-pytorch-list

A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.

Trend 3

awesome awesome-list computer-vision cv data-science deep-learning facebook machine-learning natural-language-processing neural-network nlp nlp-library papers probabilistic-programming python pytorch pytorch-model pytorch-tutorials tutorials utility-library

16.5k 2.8k +1/wk

GitHub

LT

datawhalechina/leedl-tutorial

《李宏毅深度学习教程》（李宏毅老师推荐👍，苹果书🍎），PDF下载地址：https://github.com/datawhalechina/leedl-tutorial/releases

Trend 3

bert chatgpt cnn deep-learning diffusion gan leedl-tutorial machine-learning network-compression pruning reinforcement-learning rnn self-attention transfer-learning transformer tutorial

16.5k 3.1k -3/wk

GitHub

LO

lukas-blecher/LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Trend 3

dataset deep-learning im2latex im2markup im2text image-processing image2text latex latex-ocr machine-learning math-ocr ocr python pytorch transformer vision-transformer vit

16.3k 1.3k +4/wk

GitHub

UN

Unstructured-IO/unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

Trend 3

data-pipelines deep-learning document-image-analysis document-image-processing document-parser document-parsing docx donut information-retrieval langchain llm machine-learning ml natural-language-processing nlp ocr pdf pdf-to-json pdf-to-text preprocessing

14.4k 1.2k +7/wk

GitHub

DL

davisking/dlib

A toolkit for making real world machine learning and data analysis applications in C++

Trend 3

c-plus-plus computer-vision deep-learning dlib machine-learning machine-learning-library python

14.4k 3.5k +0/wk

GitHub

VI

virgili0/Virgilio

Your new Mentor for Data Science E-Learning.

Trend 3

business-intelligence computer-vision data-science datascience guide guidelines hacktoberfest learning learning-python machine-learning machine-vision nlp path python scikit-learn statistics study studypath tensorflow virgilio

14.3k 2.5k +1/wk

GitHub

PG

jacobgil/pytorch-grad-cam

Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.

Trend 3

class-activation-maps computer-vision deep-learning explainable-ai explainable-ml grad-cam image-classification interpretability interpretable-ai interpretable-deep-learning machine-learning object-detection pytorch score-cam vision-transformers visualizations xai

12.7k 1.7k +0/wk

GitHub

AD

diff-usion/Awesome-Diffusion-Models

A collection of resources and papers on Diffusion Models

Trend 3

artificial-intelligence diffusion-models generative-model machine-learning score-based score-matching

12.3k 1.0k +1/wk

GitHub

NE

nerfstudio-project/nerfstudio

A collaboration friendly studio for NeRFs

Trend 3

3d 3d-graphics 3d-reconstruction computer-vision deep-learning gaussian-splatting machine-learning nerf photogrammetry pytorch

11.4k 1.6k +2/wk

GitHub

KO

kornia/kornia

🐍 Geometric Computer Vision Library for Spatial AI

Trend 3

artificial-intelligence computer-vision deep-learning hacktoberfest image-processing machine-learning neural-network python pytorch robotics spatial-ai

11.2k 1.2k +1/wk

GitHub

FI

voxel51/fiftyone

Refine high-quality datasets and visual AI models

Trend 3

active-learning artificial-intelligence computer-vision data-centric-ai data-cleaning data-curation data-quality data-science deep-learning developer-tools image-classification machine-learning object-detection python unstructured-data vector-search visualization

10.6k 736 +5/wk

GitHub

RE

rerun-io/rerun

An open source SDK for logging, storing, querying, and visualizing multimodal and multi-rate data

Trend 3

computer-vision cpp multimodal python robotics rust visualization

10.5k 706 +7/wk

GitHub

CA

esimov/caire

Content aware image resize library

Trend 3

computer-vision content-aware-resize content-aware-scaling edge-detection face-detection golang image-processing image-resize machine-learning seam-carving

10.5k 386 -1/wk

GitHub

PY

yzhao062/pyod

A Python Library for Outlier and Anomaly Detection on Tabular, Text, and Image Data

Trend 3

anomaly anomaly-detection autoencoder data-mining data-science deep-learning foundation-models fraud-detection image-anomaly-detection machine-learning multimodal neural-networks nlp-anomaly-detection novelty-detection out-of-distribution-detection outlier-detection outlier-ensembles outliers unsupervised-learning

9.8k 1.5k +3/wk

GitHub

SE

apache/seatunnel

SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.

Trend 3

apache batch cdc change-data-capture data-ingestion data-integration elt embeddings high-performance llm multimodal offline real-time streaming

9.2k 2.2k -1/wk

GitHub

MO

X-PLUG/MobileAgent

Mobile-Agent: The Powerful GUI Agent Family

Trend 3

agent android app automation copilot gui mllm mobile mobile-agents multimodal multimodal-agent multimodal-large-language-models

8.4k 850 +4/wk

GitHub

VR

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

Trend 3

deepseek-r1 grpo llm multimodal multimodal-r1 qwen r1-zero reinforcement-learning vlm vlm-r1

5.9k 378 +2/wk

GitHub

GE

genkit-ai/genkit

Open-source framework for building AI-powered apps in JavaScript, Go, and Python, built and used in production by Google

Trend 3

agents ai embedders genkit llm multimodal rag vector-database

5.8k 706 +1/wk

GitHub

UL

OpenBMB/UltraRAG

A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines

Trend 3

deepseek demo easy embedding flask gpt huggingface-transformers llm mcp multimodal openai qwen rag sentence-transformers ui vllm vlm

5.5k 410 +2/wk

GitHub

DA

Eventual-Inc/Daft

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

Trend 3

ai-engineering ai-pipeline arrow artificial-intelligence big-data data-engineering distributed distributed-computing distributed-systems embeddings etl huggingface iceberg machine-learning multimodal parquet python ray rust

5.4k 439 +4/wk

GitHub

XT

InternLM/xtuner

A Next-Generation Training Engine Built for Ultra-Large MoE Models

Trend 3

agent deepseek-v3 gpt-oss intern-s1 internvl kimi-k2 llm multimodal qwen3-moe qwen3-vl reinforcement-learning

5.1k 413 +0/wk

GitHub

AA

PKU-Alignment/align-anything

Align Anything: Training All-modality Model with Feedback

Trend 3

chameleon dpo large-language-models multimodal rlhf vision-language-model

4.6k 507 +0/wk

GitHub

AA

luban-agi/Awesome-AIGC-Tutorials

Curated tutorials and resources for Large Language Models, AI Painting, and more.

Trend 3

ai aigc awesome chatgpt courses-resource deep-learning llm midjourney multimodal nlp prompt-engineering stable-diffusion tutorials

4.5k 300 +2/wk

GitHub

IM

rom1504/img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Trend 3

big-data dataset deep-learning download-images image image-dataset multimodal

4.4k 375 +0/wk

GitHub

LE

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Trend 3

agi audio-evaluation benchmark evaluation large-language-models llm-evaluation multimodal multimodal-evaluation video-understanding vision-language-model vlm

4.0k 557 +0/wk

GitHub

NG

NExT-GPT/NExT-GPT

Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

Trend 3

chatgpt foundation-models gpt-4 instruction-tuning large-language-models llm mllm multi-modal-chatgpt multimodal visual-language-learning

3.6k 361 +0/wk

GitHub

AL

atfortes/Awesome-LLM-Reasoning

From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓

Trend 3

awesome chain-of-thought chatgpt cot deepseek deepseek-r1 gpt gpt-4o in-context-learning language-models mllm multimodal openai-o1 papers prompt prompt-engineering reasoning strawberry

3.6k 202 +1/wk

GitHub

MC

morphik-org/morphik-core

The most accurate document search and store for building AI apps

Trend 3

artificial-intelligence cache-augmented-generation colpali database litellm multimodal rag rules-based-ingestion

3.6k 297 +2/wk

GitHub

MT

embeddings-benchmark/mteb

MTEB: Massive Text Embedding Benchmark

Trend 3

benchmark bitext-mining clustering information-retrieval low-resource-nlp mteb multilingual-nlp multimodal neural-search reranking retrieval sbert semantic-search sentence-transformers sts text-classification text-embedding

3.2k 586 +0/wk

GitHub

VO

vortex-data/vortex

An extensible, state of the art columnar file format. Formerly at @spiraldb, now an Incubation Stage project at LFAI&Data, part of the Linux Foundation.

Trend 3

array arrow compression file multimodal python rust

2.8k 144 +2/wk

GitHub

CR

rom1504/clip-retrieval

Easily compute clip embeddings and build a clip retrieval system with them

Trend 3

ai clip deep-learning knn multimodal semantic-search

2.7k 239 +1/wk

GitHub

MA

roboflow/maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Trend 3

captioning fine-tuning florence-2 multimodal objectdetection paligemma phi-3-vision qwen2-vl transformers vision-and-language vqa

2.7k 222 +0/wk

GitHub

OF

OFA-Sys/OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Trend 3

chinese image-captioning multimodal pretrained-models pretraining prompt prompt-tuning referring-expression-comprehension text-to-image-synthesis vision-language visual-question-answering

2.6k 250 +1/wk

GitHub

HU

InternLM/HuixiangDou

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

Trend 3

application assistant assistant-chat-bots chatbot dsl group-chat image-retrieval lark llm multimodal pipeline rag robot wechat

2.5k 185 +1/wk

GitHub

MD

X-PLUG/mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Trend 3

chart-understanding document-understanding mllm multimodal multimodal-large-language-models table-understanding

2.4k 149 +1/wk

GitHub

IN

OpenGVLab/InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Trend 3

action-recognition benchmark contrastive-learning foundation-models instruction-tuning masked-autoencoder multimodal open-set-recognition self-supervised spatio-temporal-action-localization temporal-action-localization video-clip video-data video-dataset video-question-answering video-retrieval video-understanding vision-transformer zero-shot-classification zero-shot-retrieval

2.2k 144 +1/wk

GitHub

GA

genieincodebottle/generative-ai

Comprehensive resources on Generative AI, including a detailed roadmap, projects, use cases, interview preparation, and coding preparation.

Trend 3

agentic-ai agentic-framework claude gemini genai genai-usecase generative-ai interview-questions langchain langgraph large-language-model llm-agent llm-evaluation mcp model-context-protocol multimodal n8n n8n-workflow openai-api retrieval-augmented-generation

2.2k 539 +2/wk

GitHub

GP

google-gemini/genai-processors

GenAI Processors is a lightweight Python library that enables efficient, parallel content processing.

Trend 3

agent ai asyncio gemini genai generative-ai language-model multimodal python realtime

2.1k 214 +2/wk

GitHub

BI

kyegomez/BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Trend 3

artificial-intelligence deep-neural-networks deeplearning gpt4 machine-learning multimodal multimodal-deep-learning

1.9k 172 +2/wk

GitHub

SO

showlab/Show-o

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Trend 3

diffusion-models large-language-models multimodal

1.9k 90 +0/wk

GitHub

QV

2U1/Qwen-VL-Series-Finetune

An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.

Trend 3

multimodal qwen2-5-vl qwen2-vl qwen3-5 qwen3-vl vision-language vision-language-model vlm

1.8k 207 +0/wk

GitHub

DE

potamides/DeTikZify

Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ.

Trend 3

draw graph huggingface inverse-graphics latex llama llm multimodal sketch tikz transformers vectorization visualization

1.8k 91 +0/wk

GitHub

SD

dailenson/SDT

This repository is the official implementation of Disentangling Writer and Character Styles for Handwriting Generation (CVPR 2023)

Trend 3

computer-vision contrastive-learning deep-learning generative-models gmm handwriting-generation multimodal pytorch-implementation transformer

1.4k 111 +0/wk

GitHub

HL

valentinfrlch/ha-llmvision

Visual intelligence for your home.

Trend 3

ai cctv-detection hacs-integration home-assistant llm multimodal notifications smart-home vision

1.3k 115 +1/wk

GitHub

AV

gokayfem/awesome-vlm-architectures

Famous Vision Language Models and Their Architectures

Trend 3

awesome awesome-list blip clip cogvlm image-encoder internlm kosmos llava multimodal qwen-vl text-encoder vision-language-model vlm

1.2k 56 +0/wk

GitHub

CL

ArrowLuo/CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Trend 3

activitynet clip didemo lsmdc msrvtt msvd multimodal multimodal-learning multimodality ranking retrieval retrieval-model search video-clip-retrieval video-text-retrieval

1.0k 135 +1/wk

GitHub

AM

yaotingwangofficial/Awesome-MCoT

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Trend 3

chain-of-thought cot deepseek-r1 instruction-tuning large-vision-language-model mcts mllm-reasoning multimodal multimodal-chain-of-thought multimodal-large-language-models openai-o1 reasoning slow-thinking survey system-2

976 32 +1/wk

GitHub

PA

allenai/papermage

library supporting NLP and CV research on scientific papers

Trend 3

computer-vision machine-learning multimodal natural-language-processing pdf-processing python scientific-papers

793 64 +0/wk

GitHub

LE

EvolvingLMMs-Lab/lmms-engine

A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.

Trend 3

agi large-language-models multimodal unified-multimodal-models video-generation

756 35 +1/wk

GitHub

RD

Denis2054/RAG-Driven-Generative-AI

This repository provides programs to build Retrieval Augmented Generation (RAG) code for Generative AI with LlamaIndex, Deep Lake, and Pinecone leveraging the power of OpenAI and Hugging Face models for generation and evaluation.

Trend 3

advanced-rag chroma chromadb embedding-models fine-tuning gpt-4o-mini gpt4-omni grok huggingface indexing-querying llama llama-index multimodal openai-api pinecone rag scaling vision-transformer xai-grok

596 202 +1/wk

GitHub

CL

monatis/clip.cpp

CLIP inference in plain C/C++ with no extra dependencies

Trend 3

c clip cpp ggml image-search multimodal

557 53 +1/wk

GitHub

HO

Tencent-Hunyuan/Hunyuan3D-Omni

Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets

Trend 3

3d 3d-aigc 3d-generation hunyuan3d image-to-3d multimodal shape

553 48 +0/wk

GitHub

CH

NetManAIOps/ChatTS

[VLDB' 25] ChatTS: Understanding, Chat, Reasoning about Time Series with TS-MLLM

Trend 3

llm multimodal timeseries timeseries-analysis

445 45 +1/wk

GitHub

GR

tangbotony/GraTAG

GraTAG — Production AI Search via Graph-Based Query Decomposition and Triplet-Aligned Generation with Rich Multimodal Representations

Trend 1

✦ New Signal

ai-search-engine multimodal query-decomposition rag reinforcement-learning retrieval-augmented-generation triplet-extraction

113 7 +31/wk

GitHub

JA

deepseek-ai/Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

Trend 0

any-to-any foundation-models llm multimodal unified-model vision-language-pretraining

17.7k 2.2k -1/wk

GitHub

IN

NVlabs/instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

Trend 0

3d-reconstruction computer-graphics computer-vision cuda function-approximation machine-learning nerf neural-network real-time real-time-rendering realtime signed-distance-functions

17.4k 2.1k +1/wk

GitHub

DL

kmario23/deep-learning-drizzle

Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!

Trend 0

artificial-intelligence-algorithms artificial-neural-networks bayesian-statistics computer-vision deep-learning deep-neural-networks deep-reinforcement-learning explainable-ai geometric-deep-learning graph-neural-networks machine-learning medical-imaging natural-language-processing optimization pattern-recognition probabilistic-graphical-models probability reinforcement-learning speech-recognition visual-recognition

12.8k 3.0k +0/wk

GitHub

FM

zalandoresearch/fashion-mnist

A MNIST-like fashion product database. Benchmark :point_down:

Trend 0

benchmark computer-vision convolutional-neural-networks dataset deep-learning fashion fashion-mnist gan machine-learning mnist zalando

12.7k 3.1k -2/wk

GitHub

CP

extreme-assistant/CVPR2024-Paper-Code-Interpretation

cvpr2024/cvpr2023/cvpr2022/cvpr2021/cvpr2020/cvpr2019/cvpr2018/cvpr2017 论文/代码/解读/直播合集，极市团队整理

Trend 0

computer-vision cvpr2019 cvpr2020 cvpr2021 cvpr2022 deep-learning image-classification image-segmentation machine-learning object-detection papers

12.5k 2.2k +0/wk

GitHub

LU

ludwig-ai/ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models

Trend 0

computer-vision data-centric data-science deep deep-learning deeplearning fine-tuning learning llama llama2 llm llm-training machine-learning machinelearning mistral ml natural-language natural-language-processing neural-network pytorch

11.7k 1.2k -1/wk

GitHub

RS

RunanywhereAI/runanywhere-sdks

Production ready toolkit to run AI locally

Trend 0

android apple-intelligence cpp diffusion-models edge flutter inference ios kotlin llamacpp llm multimodal ollama on-device-ai react-native swift vlm voice-ai web websdk

10.3k 347 +0/wk

GitHub

Top Projects (100)

fikrikarim/parlor

x-zheng16/Awesome-Embodied-AI-Safety

lcqysl/GEMS

zai-org/GLM-skills

qingchencloud/clawpanel

OpenMOSS/MOSS-TTS

vllm-project/vllm-omni

datawhalechina/all-in-rag

yuanzhao-CVLAB/UniMMAD

Anyesh/wardrowbe

ParisNeo/lollms

AccumulateMore/CV

modelscope/ms-swift

Blaizzy/mlx-audio

UbiquitousLearning/mllm

llm-jp/awesome-japanese-llm

InternRobotics/PointLLM

OpenMOSS/MOVA

PaddlePaddle/PaddleMIX

EvolvingLMMs-Lab/NEO

enoche/MMRec

TIGER-AI-Lab/VLM2Vec

shenhao-stu/ohmycaptcha

ictnlp/LLaVA-Mini

suzuran0y/CCTV-Smartphone-AI-Monitoring

RobotecAI/rai

qingchencloud/clawapp

antflydb/antfly

clawdotnet/openclaw.net

marqo-ai/marqo-FashionCLIP

isLinXu/paper-list

SmooSenseAI/smoosense

oidlabs-com/Lexoid

bytedance/UI-TARS-desktop

haotian-liu/LLaVA

microsoft/unilm

jina-ai/serve

screenpipe/screenpipe

pytorch/vision

bharathgs/Awesome-pytorch-list

datawhalechina/leedl-tutorial

lukas-blecher/LaTeX-OCR

Unstructured-IO/unstructured

davisking/dlib

virgili0/Virgilio

jacobgil/pytorch-grad-cam

diff-usion/Awesome-Diffusion-Models

nerfstudio-project/nerfstudio

kornia/kornia

voxel51/fiftyone

rerun-io/rerun

esimov/caire

yzhao062/pyod

apache/seatunnel

X-PLUG/MobileAgent

om-ai-lab/VLM-R1

genkit-ai/genkit

OpenBMB/UltraRAG

Eventual-Inc/Daft

InternLM/xtuner

PKU-Alignment/align-anything

luban-agi/Awesome-AIGC-Tutorials

rom1504/img2dataset

EvolvingLMMs-Lab/lmms-eval

NExT-GPT/NExT-GPT

atfortes/Awesome-LLM-Reasoning

morphik-org/morphik-core

embeddings-benchmark/mteb

vortex-data/vortex

rom1504/clip-retrieval

roboflow/maestro

OFA-Sys/OFA

InternLM/HuixiangDou

X-PLUG/mPLUG-DocOwl

OpenGVLab/InternVideo

genieincodebottle/generative-ai

google-gemini/genai-processors

kyegomez/BitNet

showlab/Show-o