CR

cxcscmu/Craw4LLM

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

651 60 +0/wk
GitHub
crawler crawling large-language-models llm pre-training pretraining web-crawler web-crawling
Trend 0

Star & Fork Trend (33 data points)

Stars
Forks

Multi-Source Signals

Growth Velocity

cxcscmu/Craw4LLM has +0 stars this period . Velocity data will be available after more historical data is collected.

Deep analysis is being generated for this repository.

Signal-backed technical analysis will be available soon.

Metric Craw4LLM pysentimiento ai-file-sorter Awesome-Scientific-Language-Models
Stars 651 651650650
Forks 60 727637
Weekly Growth +0 +1+1+0
Language Python Jupyter NotebookC++N/A
Sources 1 111
License MIT NOASSERTIONAGPL-3.0MIT

Capability Radar vs pysentimiento

Craw4LLM
pysentimiento
Maintenance Activity 0

Last code push 408 days ago.

Community Engagement 46

Fork-to-star ratio: 9.2%. Lower fork ratio may indicate passive usage.

Issue Burden 70

Issue data not yet available.

Growth Momentum 30

No measurable growth in the current period (first-day cold start expected).

License Clarity 95

Licensed under MIT. Permissive — safe for commercial use.

Risk scores are computed from real-time repository data. Higher scores indicate healthier metrics.