Skip to content

feat: add 5 China authoritative data sources (AM batch 2026-05-06)#211

Merged
mingcha-dev merged 1 commit intoMLT-OSS:mainfrom
firstdata-dev:feat/add-china-sources-20260506-am
May 6, 2026
Merged

feat: add 5 China authoritative data sources (AM batch 2026-05-06)#211
mingcha-dev merged 1 commit intoMLT-OSS:mainfrom
firstdata-dev:feat/add-china-sources-20260506-am

Conversation

@firstdata-dev
Copy link
Copy Markdown
Collaborator

Summary

Adds 5 new Chinese authoritative data sources across research, health regulation, nutrition, scientific terminology, and science popularization domains.

New Sources

ID Name (中文) Authority Domain Website
china-cngbdb 国家基因库生命大数据平台 research health/science db.cngb.org
china-cmde 国家药品监督管理局医疗器械技术审评中心 government health/regulation cmde.org.cn
china-cns 中国营养学会 research health/food/science cnsoc.org
china-termonline 术语在线(全国科技名词委) government science/education/language termonline.cn
china-cstm 中国科学技术馆 government science/education/culture cstm.org.cn

Why these sources

  • CNGBdb — National-scale genomics/bioinformatics data hosted by China National GeneBank (Shenzhen); complements the existing NGDC entry by covering the CNGB/BGI-hosted sequence archive (CNSA), nucleotide/protein and literature databases.
  • CMDE — Official NMPA technical review body for Class II/III and imported medical devices; publishes registration acceptance, review status, technical review guidance, IVD and innovation-device records. Fills a clear gap between the existing NMPA entry (administrative) and device-specific technical review.
  • Chinese Nutrition Society (CNS) — Authoritative publisher of DRIs and the Dietary Guidelines for Chinese Residents; no prior nutrition-focused source in the catalog.
  • TermOnline — CNCTST's official terminology platform with 500k+ standardized Chinese scientific/technical terms and English equivalents; valuable as ground-truth reference for LLM terminology and translation workflows.
  • China Science and Technology Museum (CSTM) — National comprehensive science museum under CAST; publishes science literacy indicators, mobile/digital science museum data, and STEM education resources.

Checks

  • ✅ All websites verified accessible (200/202/302)
  • ✅ No blacklist matches (scripts/check-blacklist.sh)
  • ✅ No website/ID duplicates against main + open PRs
  • make check passes — 700 unique IDs, domain consistency OK
  • ✅ Schema-conformant (website as URL, data_content arrays, domains hyphenated, ISO alpha-2 country codes, valid authority_level/update_frequency values)
  • ✅ Tags mixed Chinese/English, no whitespace, English lowercase (per 2026-04-30 convention)

Closes part of the '中国优先 上午批次' daily contribution schedule.

Add 5 new Chinese authoritative data sources spanning research,
health regulation, nutrition, scientific terminology, and science
popularization:

- china-cngbdb: 国家基因库生命大数据平台 (China National GeneBank DataBase)
  Unified life science big data platform run by China National GeneBank
  (CNGB, Shenzhen); hosts CNSA sequence archive, nucleotide/protein
  databases, literature, samples, and bioinformatics tools.

- china-cmde: 国家药品监督管理局医疗器械技术审评中心 (Center for Medical
  Device Evaluation, NMPA) — technical review authority for Class II/III
  and imported medical devices; publishes registration, review-status,
  technical-review guidance, and IVD/innovation-device records.

- china-cns: 中国营养学会 (Chinese Nutrition Society) — national learned
  society (est. 1945), publisher of the authoritative Chinese Dietary
  Reference Intakes (DRIs), Dietary Guidelines for Chinese Residents,
  and population nutrition survey results.

- china-termonline: 术语在线 (TermOnline — China National Terminology
  Service Platform) — operated by CNCTST (全国科学技术名词审定委员会);
  500,000+ standardized Chinese scientific/technical terms with English
  equivalents across 100+ disciplines.

- china-cstm: 中国科学技术馆 (China Science and Technology Museum) —
  China's national comprehensive science museum under CAST; publishes
  science literacy indicators, mobile/digital science museum data, and
  science education resources.

All sources verified: websites return 200/202/302, no blacklist or
existing-website duplicates, make check passes (700 unique IDs).
Copy link
Copy Markdown
Collaborator

@mingcha-dev mingcha-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

明察 QA Review — PR #211 APPROVED ✅

Checklist

  • ✅ CI 三项全绿(secrecy / schema / validate)
  • ✅ 保密(body + 5 文件内容)
  • ✅ ID 去重(5 新 ID 全库唯一)
  • 缩写冲突排查
    • china-cns(中国营养学会,cnsoc.org)vs 已有 china-cnsa(国家航天局,cnsa.gov.cn)— 子串匹配但不同机构不冲突
    • cngbdb / cmde / cstm / termonline 均无其他冲突
  • ✅ 域名去重
  • ✅ URL + title:
    • cstm: 中国科学技术馆 ✓
    • cmde: [202] SPA 无 title,whois Registrant = 国家药品监督管理局医疗器械技术审评中心 ✓(官方域名权威)
    • cns: 中国营养学会官网 ✓
    • cngbdb: CNGBdb ✓
    • termonline: 术语在线 ✓
  • ✅ Domains kebab-case(2-3/文件)
  • ✅ Tags 15/文件,无空格 / 乱码

覆盖价值

  • cngbdb:国家基因库(华大 BGI),生命组学大数据
  • cmde:医疗器械审评中心(NMPA 下属,与 nifdc/nmpa 形成体系)
  • cns:营养学会(膳食指南权威)
  • cstm:科学技术馆(科普数据)
  • termonline:全国科学技术名词审定委员会术语平台(首个术语库源)

Merge 🚀

@mingcha-dev mingcha-dev merged commit 51904ab into MLT-OSS:main May 6, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants