CR
cxcscmu/Craw4LLM
Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"
651 60 +0/wk
GitHub
crawler crawling large-language-models llm pre-training pretraining web-crawler web-crawling
Trend
0
Star & Fork Trend (33 data points)
Stars
Forks
Multi-Source Signals
Growth Velocity
cxcscmu/Craw4LLM has +0 stars this period . Velocity data will be available after more historical data is collected.
Deep analysis is being generated for this repository.
Signal-backed technical analysis will be available soon.
| Metric | Craw4LLM | pysentimiento | ai-file-sorter | Awesome-Scientific-Language-Models |
|---|---|---|---|---|
| Stars | 651 | 651 | 650 | 650 |
| Forks | 60 | 72 | 76 | 37 |
| Weekly Growth | +0 | +1 | +1 | +0 |
| Language | Python | Jupyter Notebook | C++ | N/A |
| Sources | 1 | 1 | 1 | 1 |
| License | MIT | NOASSERTION | AGPL-3.0 | MIT |
Capability Radar vs pysentimiento
Craw4LLM
pysentimiento
Maintenance Activity 0
Last code push 408 days ago.
Community Engagement 46
Fork-to-star ratio: 9.2%. Lower fork ratio may indicate passive usage.
Issue Burden 70
Issue data not yet available.
Growth Momentum 30
No measurable growth in the current period (first-day cold start expected).
License Clarity 95
Licensed under MIT. Permissive — safe for commercial use.
Risk scores are computed from real-time repository data. Higher scores indicate healthier metrics.