NVIDIA/OpenShell
OpenShell is the safe, private runtime for autonomous AI agents.
TensorRT, NeMo, Megatron-LM, RAPIDS, cuDF
OpenShell is the safe, private runtime for autonomous AI agents.
Tooling for optimized, validated, and reproducible GPU-accelerated AI runtime in Kubernetes
NVIDIA DLSS is a new and improved deep learning neural network that boosts frame rates and generates beautiful, sharp images for your games
the LLM vulnerability scanner
Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
LLM KV cache compression made easy
BioNeMo Framework: For building and adapting AI models in drug discovery at scale
The NVIDIA NeMo Agent toolkit is an open-source library for efficiently connecting and optimizing teams of AI agents.
Open-source deep-learning framework for exploring, building and deploying AI weather/climate workflows.
NeMo Retriever Library is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images that you can use in downstream generative applications.
Framework providing pythonic APIs, algorithms and utilities to be used with PhysicsNeMo core to physics inform model training as well as higher level abstraction for domain experts
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM
Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
Optimized primitives for collective multi-GPU communication
Deep Learning GPU Training System
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
A Python framework for GPU-accelerated simulation, robotics, and machine learning.
Ongoing research training transformer models at scale
A Flow-based Generative Network for Speech Synthesis
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
Transformer related optimization, including BERT, GPT
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.
Style transfer, deep learning, feature transform
CUDA Templates and Python DSLs for High-Performance Linear Algebra
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Fast and accurate object detection with end-to-end GPU optimization
Unsupervised Language Modeling at scale for robust sentiment classification
Deep Learning Experiment Management
A library that uses hardware acceleration to load sequences of video frames to facilitate machine learning training
Synthesizing and manipulating 2048x1024 images with conditional GANs
Deep learning for recommender systems
Context-Aware RAG library for Knowledge Graph ingestion and retrieval functions.
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
NVIDIA Deep learning Dataset Synthesizer (NDDS)
A suite of image and video neural tokenizers