MN

esbatmop/MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

4.2k 288 +2/wk
GitHub
chinese chinese-language chinese-nlp chinese-simplified corpus-data nlp nlp-machine-learning
Trend 3

Star & Fork Trend (18 data points)

Stars
Forks

Multi-Source Signals

Growth Velocity

esbatmop/MNBVC has +2 stars this period . 7-day velocity: 0.1%.

Deep analysis is being generated for this repository.

Signal-backed technical analysis will be available soon.

Metric MNBVC lmql oasis Awesome-ChatGPT
Stars 4.2k 4.2k4.2k4.2k
Forks 288 219450385
Weekly Growth +2 -1+15+1
Language N/A PythonPythonN/A
Sources 1 111
License MIT Apache-2.0Apache-2.0N/A

Capability Radar vs lmql

MNBVC
lmql
Maintenance Activity 100

Last code push 2 days ago.

Community Engagement 35

Fork-to-star ratio: 6.9%. Lower fork ratio may indicate passive usage.

Issue Burden 70

Issue data not yet available.

Growth Momentum 43

+2 stars this period — 0.05% growth rate.

License Clarity 95

Licensed under MIT. Permissive — safe for commercial use.

Risk scores are computed from real-time repository data. Higher scores indicate healthier metrics.