thu-ml/SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

3.3k 390 +0/wk

GitHub

attention cuda efficient-attention inference-acceleration llm llm-infra mlsys quantization triton video-generate video-generation vit

Trend 3

Star & Fork Trend (20 data points)

Stars

Forks

Multi-Source Signals

GitHub

stars 3.3k

forks 390

Growth Velocity

thu-ml/SageAttention has +0 stars this period . 7-day velocity: 0.1%.

Deep analysis is being generated for this repository.

Signal-backed technical analysis will be available soon.

Metric	SageAttention	Torch-Pruning	Awesome-Code-LLM	Acontext
Stars	3.3k	3.3k	3.3k	3.3k
Forks	390	377	225	309
Weekly Growth	+0	+1	+0	+2
Language	Cuda	Python	N/A	TypeScript
Sources	1	1	1	1
License	Apache-2.0	MIT	N/A	Apache-2.0

Capability Radar vs Torch-Pruning

SageAttention

Torch-Pruning

Maintenance Activity 57

Last code push 81 days ago.

Community Engagement 59

Fork-to-star ratio: 11.9%. Active community forking and contributing.

Issue Burden 70

Issue data not yet available.

Growth Momentum 30

No measurable growth in the current period (first-day cold start expected).

License Clarity 95

Licensed under Apache-2.0. Permissive — safe for commercial use.

Risk scores are computed from real-time repository data. Higher scores indicate healthier metrics.