Computer Vision

HOT

Projects and tools related to computer vision algorithms and applications.

Active projects 100
New this week +1157
Total star growth +736
Cross-source 13
528.0k
Total Stars
72.4k
Total Forks
18
Multi-Source Repos
+736
Stars This Period

Top Projects (100)

DA

huggingface/datasets

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

Trend 22
ai artificial-intelligence computer-vision dataset-hub datasets deep-learning huggingface llm machine-learning natural-language-processing nlp numpy pandas pytorch speech tensorflow
21.4k 3.2k +1/wk
GitHub HuggingFace PyPI 3-source
UL

ultralytics/ultralytics

Ultralytics YOLO 🚀

Trend 19
cli computer-vision deep-learning hub image-classification instance-segmentation machine-learning object-detection pose-estimation python pytorch rotated-object-detection segment-anything tracking ultralytics yolo yolo-world yolo11 yolo26 yolov8
55.6k 10.7k +49/wk
GitHub PyPI arxiv 3-source
ME

google-ai-edge/mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

Trend 18
android audio-processing c-plus-plus calculator computer-vision deep-learning framework graph-based graph-framework inference machine-learning mediapipe mobile-development perception pipeline-framework stream-processing video-processing
34.6k 5.9k +26/wk
GitHub PyPI 2-source
EA

JaidedAI/EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Trend 18
cnn crnn data-mining deep-learning easyocr image-processing information-retrieval lstm machine-learning ocr optical-character-recognition python pytorch scene-text scene-text-recognition
29.3k 3.6k +7/wk
GitHub PyPI 2-source
LS

HumanSignal/label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

Trend 14
annotation annotation-tool annotations boundingbox computer-vision data-labeling dataset datasets deep-learning image-annotation image-classification image-labeling image-labelling-tool label-studio labeling labeling-tool mlops semantic-segmentation text-annotation yolo
27.0k 3.5k +13/wk
GitHub PyPI 2-source
OP

calesthio/OpenMontage

World's first open-source, agentic video production system. 11 pipelines, 49 tools, 400+ agent skills. Turn your AI coding assistant into a full video production studio.

Trend 14
Breakout +112.1%
agent agentic-ai ai claude copilot cursor elevenlabs ffmpeg flux image-generation open-source openai python remotion stable-diffusion text-to-speech text-to-video video-generation video-production
844 149 +91/wk
GitHub
IN

invoke-ai/InvokeAI

Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products.

Trend 13
ai-art artificial-intelligence generative-art image-generation img2img inpainting latent-diffusion linux macos outpainting stable-diffusion txt2img windows
27.0k 2.8k +16/wk
GitHub PyPI 2-source
VP

lucidrains/vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Trend 13
artificial-intelligence attention-mechanism computer-vision image-classification transformers
25.0k 3.5k +1/wk
GitHub PyPI 2-source
KH

khoj-ai/khoj

Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.

Trend 13
agent ai assistant chat chatgpt emacs image-generation llama3 llamacpp llm obsidian obsidian-md offline-llm productivity rag research self-hosted semantic-search stt whatsapp-ai
34.0k 2.1k +26/wk
GitHub HuggingFace PyPI 3-source
AI

microsoft/AirSim

Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research

Trend 11
ai airsim artificial-intelligence autonomous-quadcoptor autonomous-vehicles computer-vision control-systems cross-platform deep-reinforcement-learning deeplearning drones pixhawk platform-independent research self-driving-car simulator unreal-engine
18.1k 4.9k +4/wk
GitHub HuggingFace PyPI 3-source
FR

blakeblackshear/frigate

NVR with realtime local object detection for IP cameras

Trend 10
ai camera google-coral home-assistant home-automation homeautomation mqtt nvr object-detection realtime rtsp tensorflow
31.3k 3.0k +16/wk
GitHub PyPI 2-source
LO

mudler/LocalAI

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

Trend 7
agents ai api audio-generation decentralized distributed image-generation libp2p llama llm mamba mcp musicgen object-detection rerank stable-diffusion text-generation tts
45.1k 3.9k +58/wk
GitHub HuggingFace PyPI 3-source
DZ

d2l-ai/d2l-zh

《动手学深度学习》:面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。

Trend 5
book chinese computer-vision deep-learning machine-learning natural-language-processing notebook python
76.9k 12.2k +25/wk
GitHub PyPI 2-source
WE

foxhui/WebAI2API

WebAI2API: 基于 Camoufox 的网页 AI 转 API 工具,支持 LMArena/Gemini等,多窗口并发与账号隔离。 | Web AI to OpenAI API via Camoufox. Supports LMArena/Gemini and more, multi-window concurrency & account isolation.

Trend 5
🔥 Heating Up +23.9%
ai-tools browser-automation generative-ai image-generation openai-api text-generation text-to-image web-scraping
368 114 +26/wk
GitHub
AE

rohitg00/ai-engineering-from-scratch

Learn it. Build it. Ship it for others.

Trend 4
agents ai ai-agents ai-engineering computer-vision course deep-learning from-scratch generative-ai llm machine-learning mcp nlp python reinforcement-learning rust swarm-intelligence transformers tutorial typescript
2.0k 382 +33/wk
GitHub
VW

SamurAIGPT/Vibe-Workflow

Free, open-source alternative to Weavy AI, Krea Nodes, Freepik Spaces & FloraFauna AI — node-based AI workflow builder for generative image & video pipelines

Trend 4
ai ai-workflow-builder artistic-intelligence comfyui creative-tools fastapi florafauna-ai-alternative freepik-spaces-alternative generative-ai image-generation krea-nodes-alternative nextjs node-editor open-source self-hosted-ai video-generation weavy weavy-ai-alternative weavyai workflow-automation
112 26 +3/wk
GitHub
AH

ai-hpc/ai-hardware-engineer-roadmap

Design a custom AI inference chip. That is the goal.

Trend 4
autonomous-driving computer-vision cuda deep-learning digital-design edge-ai embedded-linux embedded-system gpu gpu-programming hpc mlir nvidia nvidia-jeston opencl rtos sensor-fusion tinygrad verilog xilinx
54 16 +0/wk
GitHub
SI

stirling-image/stirling-image

Stirling-PDF but for images. 30+ tools and local AI in a single Docker container - resize, compress, remove backgrounds, upscale, OCR, and more. No cloud, no telemetry. Your images never leave your machine.

Trend 4
ai docker homelab image-editor image-processing open-source self-hosted
577 15 +12/wk
GitHub
LO

ashesbloom/LocalLens

Local Lens is a privacy-first, AI-powered photo organizer for your PC. Sort and group photos by faces, dates, and locations—all locally, with no cloud upload. Enjoy a modern, intuitive UI and keep your memories organized and secure on your own device.

Trend 4
ai-tools automated-categorization computer-vision cross-platform desktop-app face-recognition facial-recognition fastapi gui-application machine-learning offline-processing photo-management photo-organization photography privacy-first productivity python react tauri windows-installer
109 6 +2/wk
GitHub
AH

Leooo-Huang/awesome-human-activity-recognition

Always up-to-date, most comprehensive HAR resource — continuously scanned and auto-updated from Papers with Code. 53 datasets integrated across all modalities.

Trend 3
action-recognition awesome awesome-list benchmark computer-vision datasets deep-learning human-activity-recognition machine-learning motion-detection pose-estimation
93 1 +2/wk
GitHub
AV

autowarefoundation/autoware_vision_pilot

Free self-driving car stack - fully open-source ADAS and autonomous driving system

Trend 3
adas advanced-driver-assistance-systems artificial-intelligence autonomous-driving autopilot autoware computer-vision deep-learning deep-neural-networks end-to-end-machine-learning foundation-models open-source robotics self-driving-car
461 99 +7/wk
GitHub
TR

LC044/TrailSnap

行影集——你的私人AI智能相册

Trend 3
ai album photo
299 39 +6/wk
GitHub
TE

TensorCEO/TensorCEO

计算机毕业设计、机器学习毕业设计、深度学习毕业设计、原创AI项目【源码+论文】

Trend 3
ai-projects artificial-intelligence computer-science-project computer-vision deep-learning dl-projects final-year-project flask graduation-design graduation-thesis machine-learning ml-projects python thesis-project
108 13 +2/wk
GitHub
VO

vllm-project/vllm-omni

A framework for efficient model inference with omni-modality models

Trend 3
audio-generation diffusion image-generation inference model-serving multimodal pytorch transformer video-generation
4.2k 719 +50/wk
GitHub
KO

mayocream/koharu

ML-powered manga translator, written in Rust.

Trend 3
computer-vision deep-learning gpu japanese manga rust tauri
2.0k 101 +10/wk
GitHub
AV

gracezhao1997/Awesome-Video-World-Models-with-AR-Diffusion

A Curated List of Awesome Video World Models with AR Diffusion: Covering Algorithms, Applications, and Infrastructure, Aimed at Serving as a Comprehensive Resource for Researchers, Practitioners, and Enthusiasts.

Trend 3
ar-diffusion autoregressive awesome-list computer-vision diffusion-models generative-ai video-generation world-models
364 13 +8/wk
GitHub
ED

Intellindust-AI-Lab/EdgeCrafter

Pytorch implementation of "EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation"

Trend 3
computer-vision dinov3 distillation instance-segmentation lightweight object-detection pose-estimation real-time
113 11 +0/wk
GitHub
NT

jau123/nanobanana-trending-prompts

1,300+ curated trending AI image prompts from X/Twitter, ranked by engagement. Works with NanoBanana Pro, GPT Image, Midjourney

Trend 3
awesome-list gemini3proimage gpt-image image-generation midjourney nanobanana nanobananapro prompt-engineering prompts
366 41 +2/wk
GitHub
FF

X-GenGroup/Flow-Factory

A unified framework for easy reinforcement learning in Flow-Matching models

Trend 3
diffusion flow-matching image-generation reinforcement-learning video-generation
318 22 +2/wk
GitHub
PI

SRA-VJTI/Pixels

SRA's seminar on Introduction to Computer Vision Fundamentals

Trend 3
build-system computer-vision cpp git github image-processing makefile numpy opencv python
189 146 +0/wk
GitHub
XA

CVHub520/X-AnyLabeling-Server

A Simple, Lightweight, and Extensible Serving Framework for X-AnyLabeling

Trend 3
annotation-tool clip computer-vision deep-learning grounding-dino image-classification image-labeling-tool instance-segmentation labeling-tool machine-learning object-detection pose-estimation pytorch rotated-object-detection segment-anything transformers vision-language-model x-anylabeling yolo
188 27 +0/wk
GitHub
SP

galilai-group/stable-pretraining

Reliable, minimal and scalable library for pretraining foundation and world models

Trend 3
computer-vision computer-vision-algorithms contrastive-learning deep-learning distributed foundation-models joint-embedding joint-embedding-predictive-architecture large-language-model multimodal-learning pytorch self-supervised-learning stable-pretraining transformers
181 34 +1/wk
GitHub
FG

nerficg-project/faster-gaussian-splatting

An efficient and research-friendly Gaussian Splatting framework described in the CVPR'26 paper "Faster-GS: Analyzing and Improving Gaussian Splatting Optimization"

Trend 3
computer-graphics computer-vision gaussian-splatting novel-view-synthesis
139 17 +2/wk
GitHub
MU

mlslabs/MLSLabsGaussianSplattingRenderer-UE

A high-performance Unreal Engine 5 (UE5) plugin developed by MaLanShan Audio & Video Laboratory, designed for real-time visualization, management, and scalable rendering of 3D Gaussian Splatting (3DGS) and dynamic Volumetric Video (4DGS).

Trend 3
3dgs 4dgs computer-graphics computer-vision gaussian-splatting radiance-field ue5 unreal-engine volumetric-video
137 12 +0/wk
GitHub
PN

AkihikoWatanabe/paper_notes

Daily notes on AI papers

Trend 3
adaptive-learning blog computer-vision educational-data-mining language-model learning-analytics machine-learning nlp notes paper recommender-systems technology
107 2 +1/wk
GitHub
CE

cerul-ai/cerul

The video search layer for AI agents. Search video by meaning — across speech, visuals, and on-screen text.

Trend 3
ai-agent ai-agents api computer-vision multimodal neon-postgres open-source pgvectorscale rag semantic-search skills understanding video-search video-search-engine
96 4 +0/wk
GitHub
A3

M-3LAB/awesome-3d-anomaly-detection

We have summarised all 3D anomaly detection methods and datasets (still updating). 多模态,点云和姿势无关异常检测的综述仓库(持续更新)

Trend 3
3d anomaly-detection anomaly-segmentation awesome-lists computer-vision datasets graphics llms point-cloud reviews three-dimensional
93 0 +0/wk
GitHub
NO

networkoptix/nx_open

NetworkOptix open-source components used to build Powered-by-Nx products including Desktop Client for Network Optix Video Management Platform.

Trend 3
ai camera desktop-client meta networkoptix nx nx-meta object-detection onvif video-processing vms webrtc
73 31 +1/wk
GitHub
CV

AccumulateMore/CV

✔(已完结)超级全面的 深度学习 笔记【土堆 Pytorch】【李沐 动手学深度学习】【吴恩达 深度学习】【大飞 大模型Agent】

Trend 3
agent agents book chinese computer-vision cv deep-learning jupyter-notebook llm llms machine-learning natural-language-processing nlp notebook python rag
19.5k 2.2k +76/wk
GitHub PyPI 2-source
XA

CVHub520/X-AnyLabeling

Effortless data labeling with AI support from Segment Anything and other awesome models.

Trend 3
artificial-intelligence clip computer-vision deep-learning groundingdino image-annotation-tool image-classification image-labeling-tool image-matting instance-segmentation machine-learning object-detection ocr onnxruntime paddlepaddle pose-estimation rotated-object-detection sam vision-language-model yolo
8.7k 932 +10/wk
GitHub
RD

roboflow/rf-detr

[ICLR 2026] RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO, designed for fine-tuning.

Trend 3
computer-vision detr instance-segmentation machine-learning object-detection rf-detr sota
6.3k 758 +14/wk
GitHub
SW

mcmonkeyprojects/SwarmUI

SwarmUI (formerly StableSwarmUI), A Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools easily accessible, high performance, and extensibility.

Trend 3
ai comfyui csharp image-generation javascript machine-learning ml python stable-diffusion
3.9k 389 +6/wk
GitHub
MA

MaaXYZ/MaaFramework

基于图像识别的自动化黑盒测试框架 | An automation black-box testing framework based on image recognition

Trend 3
black-box-testing computer-vision
3.7k 400 +14/wk
GitHub
AI

WeThinkIn/AIGC-Interview-Book

【三年面试五年模拟】AIGC算法工程师面试秘籍。涵盖AIGC、LLM大模型、AI Agent、传统深度学习、自动驾驶、机器学习、计算机视觉、自然语言处理、强化学习、大数据挖掘、具身智能、元宇宙、AGI等AI行业面试笔试干货经验与核心知识。

Trend 3
ai-agent aigc computer-vision deep-learning interview interview-preparation interview-questions interviews-solutions large-language-models machine-learning natural-language-processing openclaw stable-diffusion transformer
3.4k 379 +12/wk
GitHub
MC

HenryNdubuaku/maths-cs-ai-compendium

Become a cracked AI/ML Research Engineer

Trend 3
ai-textbook algorithms artificial-intelligence computer-science computer-vision deep-learning jax linear-algebra machine-learning machine-learning-algorithms math mathematics multimodal-learning nlp probability python reinforcement-learning speech-processing statistics
3.0k 428 +14/wk
GitHub
LS

MrNeRF/LichtFeld-Studio

Train, inspect, edit, automate, and export 3D Gaussian Splatting scenes from a single native application.

Trend 3
computer-graphics computer-vision cuda gaussian-splatting optimization
2.8k 287 +9/wk
GitHub
MA

haosulab/ManiSkill

SAPIEN Manipulation Skill Framework, an open source GPU parallelized robotics simulator and benchmark

Trend 3
3d-computer-vision computer-vision embodied-ai reinforcement-learning robot-learning robot-manipulation robotics robotics-simulation simulation-environment
2.8k 460 +5/wk
GitHub
DE

SharpAI/DeepCamera

Open-Source AI Camera Skills Platform, AI NVR & CCTV Surveillance. Local VLM video analysis with Qwen, DeepSeek, SmolVLM, LLaVA, YOLO26. LLM-powered agentic security camera agent — watches, understands, remembers & guards your home via Telegram, Discord or Slack. Pluggable AI skills. OpenAI, Google, Anthropic or local AI. Runs on Mac Mini & AI PC.

Trend 3
ai ai-camera ai-nvr camera cctv computer-vision deep-learning face-recognition home-assistant home-security llama-cpp llm local-ai machine-learning object-detection python raspberry-pi security-camera video-surveillance vlm
2.7k 425 +1/wk
GitHub
GA

HanaokaYuzu/Gemini-API

✨ Reverse-engineered Python API for Google Gemini web app

Trend 3
ai api async bard chatbot gemini generative-ai google google-gemini image-generation imagefx llm nano-banana python reverse-engineering
2.6k 378 +19/wk
GitHub
PI

pixeltable/pixeltable

Data Infrastructure providing a declarative, incremental approach for multimodal AI workloads.

Trend 3
ai artificial-intelligence chatbot computer-vision data-science database feature-engineering feature-store genai llm machine-learning ml mlops multimodal vector-database
1.6k 207 +3/wk
GitHub
RN

software-mansion/react-native-executorch

Declarative way to run AI models in React Native on device, powered by ExecuTorch.

Trend 3
computer-vision executorch image-embeddings llm-inference object-detection ocr on-device-ai react-native-ai segmentation speech-to-text text-embeddings text-to-speech vlm
1.4k 68 +1/wk
GitHub
NB

YouMind-OpenLab/nano-banana-pro-prompts-recommend-skill

AI skill for OpenClaw & Claude Code — recommend from 10000+ Nano Banana Pro (Gemini) image prompts. Smart search by use case, content remix, sample images.

Trend 3
ai-agent ai-image claude-code-skill clawhub content-creation gemini image-generation nano-banana openclaw openclaw-skill prompt-engineering prompt-library
1.4k 139 +5/wk
GitHub
CL

cleanlab/cleanvision

Automatically find issues in image datasets and practice data-centric computer vision.

Trend 3
computer-vision data-centric-ai data-exploration data-profiling data-quality data-science data-validation deep-learning exploratory-data-analysis image-analysis image-classification image-generation image-quality image-segmentation
1.2k 77 +0/wk
GitHub
CI

Linketic/CityGaussian

[ECCV`24&ICLR`25] CityGaussian Series for High-quality Large-Scale Scene Reconstruction with Gaussians

Trend 3
3d computer-vision eccv2024 gaussian-splatting graphics iclr2025 large-scale level-of-details neural-network neural-rendering novel-view-synthesis radiance-field surface-reconstruction
1.1k 96 +0/wk
GitHub
JF

LLM-Red-Team/jimeng-free-api

🚀 即梦3.0逆向API【特长:图像生成顶流】,零配置部署,多路token支持,仅供测试,如需商用请前往官方开放平台。

Trend 3
bytedance chatbot chatgpt-api image-generation image-generation-ai jimeng llm
1.1k 282 +1/wk
GitHub
SI

simpler-env/SimplerEnv

Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)

Trend 3
computer-vision embodied-ai real2sim reinforcement-learning robot-learning robot-manipulation robotics robotics-benchmark robotics-simulation
1.0k 186 +3/wk
GitHub
WI

withoutbg/withoutbg

Image Background Removal Toolkit - Open Source and API Models

Trend 3
ai-background-removal background-removal background-removal-open-source background-removal-toolkit background-remover background-remover-onnx-model computer-vision computer-vision-ai docker-background-removal-open-source image-background-removal image-matting image-processing open-source-background-removal open-source-background-remover python-background-removal
977 47 +1/wk
GitHub
AC

cuixing158/Awesome-CV-MasterHub

:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works

Trend 3
awesome image-captioning image-classification image-dehazing image-denoising image-enhancement image-fusion image-generation image-segmentation keypoint-detection low-level-vision object-detection panoptic-segmentation paper-code paper-list papers-with-code pose-estimation video-generation video-understanding vision-transformer
909 54 +2/wk
GitHub
PA

papercopilot/paperlists

Processed / Cleaned Data for Paper Copilot

Trend 3
artificial-intelligence computational-linguistics computer-graphics computer-vision databases dataminning natural-language-processing robotics
901 43 +2/wk
GitHub
CA

mees/calvin

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Trend 3
computer-vision deep-learning grounding manipulation natural-language-processing pytorch robotics vision vision-and-language vision-language
877 115 +3/wk
GitHub
VS

MIT-SPARK/VGGT-SLAM

VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

Trend 3
computer-vision slam vggt vggt-slam
866 97 +0/wk
GitHub
PO

stevenygd/PointFlow

PointFlow : 3D Point Cloud Generation with Continuous Normalizing Flows

Trend 3
3d-point-clouds computer-vision continuous-normalizing-flows machine-learning pytorch shapes
861 109 +1/wk
GitHub
MA

souvikmajumder26/Multi-Agent-Medical-Assistant

⚕️GenAI powered multi-agentic medical diagnostics and healthcare research assistance chatbot. 🏥 Designed for healthcare professionals, researchers and patients.

Trend 3
agent agentic-ai agents chatbot computer-vision disease-detection genai genai-chatbot generative-ai guardrails langchain langgraph large-language-models llm medical-image-processing medical-imaging python rag retrieval-augmented-generation vector-database
854 189 +4/wk
GitHub
SM

geekwenjie/SmartJavaAI

🔥🔥🔥Java免费离线AI算法工具箱,支持人脸识别,活体检测,表情识别、目标检测、实例分割、行人检测、OCR文字识别、车牌识别、表格识别、ASR+TTS、机器翻译等功能,Maven引用即可使用。支持PyTorch、Tensorflow,已集成 Mtcnn、InsightFace、SeetaFace6、YOLOv8~v12、PaddleOCR(PPOCRv5)、Whisper等主流模型

Trend 3
android asr clip deep-learning djl face-attribute face-comparison face-detection face-quality face-recognition landmark object-detection ocr-recognition pose-estimation silent-face-anti-spoofing table-structure-recognition translation tts yolov12 yolov8
810 140 +4/wk
GitHub
TE

alephpi/Texo

A minimalist SOTA LaTeX OCR model with only 20M parameters, running in browser. Full training pipeline available for self-reproduction. | 超轻量SOTA LaTeX公式识别模型,仅20M参数量,可在浏览器中运行。训练全流程代码开源,以便自学复现。

Trend 3
computer-vision deep-learning distillation-model formula formulanet hydra latex latex-ocr machine-learning math math-formula-recognition ocr ocr-recognition python pytorch pytorch-lightning transformers unimernet vision-encoder-decoder
788 45 +1/wk
GitHub
SG

jayin92/Skyfall-GS

Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

Trend 3
3d-reconstruction 3dgs computer-graphics computer-vision diffusion-models earth-observation remote-sensing satellite
782 85 +1/wk
GitHub
VM

waybarrios/vllm-mlx

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.

Trend 3
anthropic apple-silicon audio-processing claude-code computer-vision image-understanding inference llm machine-learning macos mllm mlx multimodal-ai speech-to-text stt text-to-speech tts video-understanding vision-language-model vllm
774 176 +0/wk
GitHub
VC

jacobkrantz/VLN-CE

Vision-and-Language Navigation in Continuous Environments using Habitat

Trend 3
ai computer-vision deep-learning python research robotics
766 82 +1/wk
GitHub
DE

deepinv/deepinv

DeepInverse: a PyTorch library for solving imaging inverse problems using deep learning

Trend 3
computational-imaging computed-tomography deblurring deep-equilibrium-models deep-learning diffusion-models image-processing image-reconstruction imaging inverse-problems mri plug-and-play pytorch super-resolution unfolded
698 158 +1/wk
GitHub
HM

hailo-ai/hailo_model_zoo

The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment

Trend 3
ai-accelerators computer-vision deep-learning edge-ai hailo hailo8 quantization quantized-neural-networks
630 83 +1/wk
GitHub
ST

LingDong-/skeleton-tracing

A new algorithm for retrieving topological skeleton as a set of polylines from binary images

Trend 3
algorithm computational-geometry computer-vision polylines skeletonization
585 64 +0/wk
GitHub
SI

jasonmanesis/Satellite-Imagery-Datasets-Containing-Ships

This repository provides a comprehensive list of radar and optical satellite datasets curated for ship detection, classification, semantic segmentation, and instance segmentation tasks. These datasets are ideal for applications in computer vision, machine learning, remote sensing, and maritime analysis.

Trend 3
classification computer-vision dataset datasets deep-learning detection hrsid instance-segmentation list maritime-analysis optical remote-sensing sar satellite-imagery semantic-segmentation ship-detection ships ssdd synthetic-aperture-radar xview
574 74 +1/wk
GitHub
BO

Rishabh-creator601/Books

Books / PDFS / EPUBS for different fields of programming . READ GROW AND ENJOY 😊😊😊😊

Trend 3
computer-vision cpp deep-learning dvc-pipeline hacking hypothesis-testing javascript machine-learning maths mlflow mongodb-database natural-language-processing pdfs python reinforcement-learning sqlite3 statistics stats yolo
573 99 +1/wk
GitHub
MM

MMMU-Benchmark/MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

Trend 3
computer-vision deep-learning deep-neural-networks evaluation foundation-models large-language-models large-multimodal-models llm llms machine-learning multimodal multimodal-deep-learning multimodal-learning multimodality natural-language-processing question-answering stem visual-question-answering
553 50 +1/wk
GitHub
RP

roboflow/roboflow-python

The official Roboflow Python package. Manage your datasets, models, and deployments. Roboflow has everything you need to build a computer vision application.

Trend 3
computer-vision deep-learning machine-learning python
552 119 +0/wk
GitHub
CS

suzuran0y/CCTV-Smartphone-AI-Monitoring

本地监控 + AI 视觉 — LAN-based smartphone-powered AI monitoring framework with structured event output for data acquisition and analysis.

Trend 3
ai-monitoring computer-vision device-repurposing event-driven image-recognition-tool ip-camera ml-ops monitoring-system multimodal structured-output video-streaming
547 38 +1/wk
GitHub
CH

ChenHongruixuan/ChangeDetectionRepository

This repository contains some python code of some traditional change detection methods or provides their original websites, such as SFA, MAD, and some deep learning-based change detection methods, such as SiamCRNN, DSFA, and some FCN-based methods.

Trend 3
change-detection deep-learning image-processing multi-temporal python remote-sensing
528 108 +1/wk
GitHub
FA

Fabric-Project/Fabric

Node Creative Coding / 3D / Image Processing tool inspired by Quartz Composer

Trend 3
3d computer-vision creative-coding graphics llm metal mlx multimedia node-based post-processing realtime shaders swift swiftui video vlm
496 23 +1/wk
GitHub
PA

ashbuilds/payload-ai

AI Plugin is a powerful extension for the Payload CMS, integrating advanced AI capabilities to enhance content creation and management.

Trend 3
ai ai-translate ai-writing ai-writing-tool content-generation gpt-image-1 image-generation payload-plugin payloadcms plugin smart-generation text-generation text-to-image text-to-speech voice-generation
471 58 +2/wk
GitHub
DR

TheDesignFounder/DreamLayer

Benchmark diffusion models faster. Automate evals, seeds, and metrics for reproducible results.

Trend 3
benchmarking diffusion-models evaluation-metrics generative-ai image-generation stable-diffusion
408 209 +1/wk
GitHub
PH

PhotonVision/photonvision

PhotonVision is the free, fast, and easy-to-use computer vision solution for the FIRST Robotics Competition.

Trend 3
computer-vision frc java opencv vision vision-processing wpilib
407 294 +0/wk
GitHub
MF

zaina-ml/ml_forge

A visual-based graph node editor for training computer vision models.

Trend 3
artificial-intelligence beginner-friendly computer-vision data-science deep-learning desktop-app drag-and-drop gui image-classification machine-learning neural-network no-code node-editor open-source pipeline python pytorch training visual-programming
402 49 +1/wk
GitHub
AL

albumentations-team/AlbumentationsX

Next-generation Albumentations: dual-licensed for open-source and commercial use

Trend 3
3d augmentation bounding-box computer-vision data-augmentation deep-learning deeplearning image-augmentation image-classification image-processing image-segmentations instance-segmentation keypoint-detection machine-learning medical-imaging object-detection python pytorch segmentation tensorflow
300 27 +0/wk
GitHub
MO

cubist38/mlx-openai-server

A high-performance API server that provides OpenAI-compatible endpoints for MLX models. Developed using Python and powered by the FastAPI framework, it provides an efficient, scalable, and user-friendly solution for running MLX-based vision and language models locally with an OpenAI-compatible interface.

Trend 3
apple-silicon fastapi flux image-generation mlx mlx-lm mlx-vlm multi-models openai-compatible queue speech-recognition structured-outputs tool-calling vision-api whisper
290 53 +1/wk
GitHub
VF

VisoMasterFusion/VisoMaster-Fusion

Powerful & Easy-to-Use Video Face Swapping and Editing Software

Trend 3
ai computer-vision face-editor faceswap live-portrait video-editor vr
263 65 +0/wk
GitHub
ME

agentmorris/MegaDetector

MegaDetector is an AI model that helps conservation folks spend less time doing boring things with camera trap images.

Trend 3
camera-traps cameratrap cameratraps computer-vision conservation ecology machine-learning megadetector wildlife
262 45 +0/wk
GitHub
HU

securade/hub

Securade.ai HUB - A generative AI based edge platform for computer vision that connects to existing CCTV cameras and makes them smart.

Trend 3
artificial-intelligence computer-vision edge-deployment face-detection fire-detection generative-ai grounding-dino industrial-safety jetson machine-learning model-zoo nvidia-gpu object-detection ppe-detection proximity-dete smoke-detection video-analytics worker-safety yolo7 zone-mana
260 26 +2/wk
GitHub
IK

Ikomia-dev/IkomiaApi

Deploy Computer Vision solutions with a few lines of code.

Trend 3
computer-vision computer-vision-ai computer-vision-algorithms computer-vision-opencv computer-vision-tools computervision deep-learning detectron2 human-pose-estimation image-processing machine-learning object-detection opencv openmmlab pose-estimation python pytorch tensorflow yolo
243 13 +1/wk
GitHub
SC

collidingScopes/shape-creator-tutorial

Create and control 3D shapes using hand gestures in real-time. Built with mediapipe computer vision and threejs

Trend 3
3d 3d-shapes augmented-reality browser computer-vision free fun-with-computer-vision hand-gesture mediapipe open-source real-time shape-creator spatial-computing threejs tutorial
232 51 +1/wk
GitHub
T3

duy-phamduc68/TrafficLab-3D

Create a digital-twin style traffic visualization using only mp4 CCTV footage and its Google Maps location.

Trend 3
3d-bbox autonomous-driving camera-calibration cctv-analysis computer-vision digital-twin geospatial-mapping homography intellegent-transportation object-detection object-tracking projection-mapping pyqt5-gui satellite-imagery smart-city traffic-analysis traffic-monitoring urban-analytics yolo
212 21 +0/wk
GitHub
PI

francozanardi/pictex

A Python library for efficient image generation using CSS Flexbox

Trend 3
flexbox flexbox-css gradient graphics image-generation python shadow skia taffy text-rendering text-to-image typography
199 5 +1/wk
GitHub
MT

petercorke/machinevision-toolbox-python

Machine vision toolbox for Python

Trend 3
blob-features bundle-adjustment camera-calibration computer-vision image-search image-segmentation machine-vision mathematical-morphology opencv python stereo-vision
193 29 +0/wk
GitHub
GA

PeculiarVentures/GammaCV

GammaCV is a WebGL accelerated Computer Vision library for browser

Trend 3
computer-vision feature-extraction gpu gpu-acceleration image-analysis image-processing machine-learning machine-vision object-detection opencv webgl
192 23 +0/wk
GitHub
CV

avs-abhishek123/Computer-Vision-Projects

All Computer Vision Projects - Beginner to Advanced

Trend 3
artificial-intelligence computer-vision deep-learning object-detection object-recognition object-tracking opencv pillow python3
180 37 +0/wk
GitHub
FD

skylab-tech/ffhqr-dataset

FFHQR -- the first large-scale retouching dataset for computer vision research.

Trend 3
computer-vision dataset deep-learning high-resolution large-scale retouching
175 12 +0/wk
GitHub
CG

codecentric/c4-genai-suite

c4 GenAI Suite

Trend 3
ai ai-agents artificial-intelligence assistant chatbot chatgpt claude dall-e gemini genai image-generation langchain llm llm-ui mcp multimodal ollama openai rag self-hosted
167 18 +0/wk
GitHub
1A

darkdevil3610/100-AI-Machine-learning-Deep-learning-Computer-vision-NLP

100+ AI Machine learning Deep learning Computer vision NLP Projects with code

Trend 3
artificial-intelligence artificial-intelligence-projects awesome collageproject compuer-vision-project computer-vision data-science deep-learning deep-learning-papers deep-learning-projects final-year-project final-year-projects fyp machine-learning machine-learning-projects nlp nlp-projects python
152 14 +1/wk
GitHub
AC

mawady/awesome-computer-vision-resources

A structured learning reference for computer vision: from image fundamentals to research frontiers

Trend 3
awesome awesome-list computer-science computer-vision data-science deep-learning education image-processing machine-learning
145 26 +1/wk
GitHub
SC

strayrobots/scanner

An app for collecting raw RGB-D scans on iOS devices.

Trend 3
3d computer-vision ios rgb-d
121 23 +0/wk
GitHub
PL

isLinXu/paper-list

autoupdate paper list

Trend 3
action-recognition anomaly-detection audio-processing classification depth-estimation graph-neural-networks image-generation llm multimodal object-detection object-tracking optical-flow pose-estimation reinforcement-learning scene-understanding semantic-segmentation transfer-learning
118 10 +0/wk
GitHub

Source Breakdown

GitHub
Stars528.0k
Forks72.4k
Repos100
PyPI
Packages13
HuggingFace
Linked Repos4

Related Topics