SA
thu-ml/SageAttention
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
3.3k 390 +0/wk
GitHub
attention cuda efficient-attention inference-acceleration llm llm-infra mlsys quantization triton video-generate video-generation vit
Trend
3
Star & Fork Trend (20 data points)
Stars
Forks
Multi-Source Signals
Growth Velocity
thu-ml/SageAttention has +0 stars this period . 7-day velocity: 0.1%.
Deep analysis is being generated for this repository.
Signal-backed technical analysis will be available soon.
| Metric | SageAttention | Torch-Pruning | Awesome-Code-LLM | Acontext |
|---|---|---|---|---|
| Stars | 3.3k | 3.3k | 3.3k | 3.3k |
| Forks | 390 | 377 | 225 | 309 |
| Weekly Growth | +0 | +1 | +0 | +2 |
| Language | Cuda | Python | N/A | TypeScript |
| Sources | 1 | 1 | 1 | 1 |
| License | Apache-2.0 | MIT | N/A | Apache-2.0 |
Capability Radar vs Torch-Pruning
SageAttention
Torch-Pruning
Maintenance Activity 57
Last code push 81 days ago.
Community Engagement 59
Fork-to-star ratio: 11.9%. Active community forking and contributing.
Issue Burden 70
Issue data not yet available.
Growth Momentum 30
No measurable growth in the current period (first-day cold start expected).
License Clarity 95
Licensed under Apache-2.0. Permissive — safe for commercial use.
Risk scores are computed from real-time repository data. Higher scores indicate healthier metrics.