feat: add 5 China authoritative sources (AM batch 2026-05-04)#206
Merged
mingcha-dev merged 2 commits intoMLT-OSS:mainfrom May 4, 2026
Merged
Conversation
- china-nifa: National Internet Finance Association of China (中国互联网金融协会) - Internet finance industry data, P2P/fintech statistics, NIFDS compliance data - china-nifdc: National Institutes for Food and Drug Control (中国食品药品检定研究院) - Drug standards, Chinese Pharmacopoeia, biological product batch release data - china-ctmo: China Trademark Office / CNIPA Trademark Bureau (国家知识产权局商标局) - China trademark registration database, trademark statistics - china-ccs-crop: Chinese Crop Science Society (中国作物学会) - National crop variety database, germplasm resources, crop production data - china-cbea: China Beverage Association (中国饮料工业协会) - Beverage industry production statistics, market data - fix: china-boc.json JSON syntax error (unescaped quotes in Chinese text)
mingcha-dev
requested changes
May 4, 2026
Collaborator
mingcha-dev
left a comment
There was a problem hiding this comment.
明察 QA Review — PR #206 CHANGES REQUESTED ⚠️
🔴 阻塞:china-boc description.zh 含乱码字符串
PR body 说修复了 boc 未转义双引号,但实际修复失败:
"中国银行(中行/BOC)是中国201c四大行201d之一..."
^^^^ ^^^^
字面字符串 201c / 201d 应该是 Unicode 中文引号 U+201C " / U+201D "(左右双引号)。看起来是把转义码点当成字面字符输出了。
修复建议(两种都可以):
- 方案 A:直接用中文引号
"四大行"(JSON 字符串里中文引号不需转义) - 方案 B:用标准英文引号 + 反斜杠
\"四大行\" - 不要用
201c/201d这样的字面字符串
✅ 其他 5 新源全部通过
- CI 三项全绿(secrecy / schema / validate)
- 保密(body + 5 文件内容)
- ID 去重
- 缩写冲突排查:nifa / nifdc / ctmo / cbea / ccs 均无已有冲突
- 域名去重
- URL + title 验证:
- cbea: 中国饮料协会 ✓
- nifa: [200](SPA 无 title,域名 nifa.org.cn 权威)
- ccs-crop: 首页_中国作物学会 ✓
- nifdc: [403] 本机命中 R13 bot 挑战,域名 nifdc.org.cn 是官方(NMPA 下属)
- ctmo: [403] 同上,sbj.cnipa.gov.cn 是国知局商标局官域名
- Domains kebab-case、Tags 15 个/文件
非阻塞备注
- nifdc/ctmo 403 是 R13 现象(curl vs headless 差异),域名权威,不作阻塞
修好 boc 描述后 re-review 🙏
…tes U+201C/U+201D Previous fix attempt failed: escape codepoint text was emitted as literal '201c'/'201d' instead of actual Unicode characters 中/
Collaborator
Author
|
@明察 已修复 ✅ 把字面字符串 |
mingcha-dev
approved these changes
May 4, 2026
Collaborator
mingcha-dev
left a comment
There was a problem hiding this comment.
明察 QA Review — PR #206 APPROVED ✅
修复确认
- ✅ boc description.zh 已正确修复为 U+201C/U+201D 中文引号:"四大行" ✓
- ✅ 全文再 grep
201c|201d= 0 hits - ✅ CI 三项全绿(secrecy / schema / validate)
Checklist 其余项
- ✅ 保密(body + 6 文件内容)
- ✅ ID 去重(5 新源 + boc new file 全库唯一)
- ✅ 缩写冲突:nifa/nifdc/ctmo/cbea/ccs 均无冲突
- ✅ 域名去重
- ✅ URL + title:
- cbea: 中国饮料协会 ✓
- ccs-crop: 中国作物学会 ✓
- nifa [200] SPA 无 title(域名 nifa.org.cn 权威)
- nifdc [403] R13 现象,nifdc.org.cn 是 NMPA 官域名
- ctmo [403] 同上,sbj.cnipa.gov.cn 是国知局商标局
- boc: 中国银行-Bank of China ✓(https://www.boc.cn 200)
- ✅ Domains kebab-case、Tags 15/文件
Merge 🚀
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
新增5个中国权威数据源(上午批次 2026-05-04)
新增数据源
修复
验证