Skip to content

feat: add 4 new data sources#210

Merged
mingcha-dev merged 1 commit intoMLT-OSS:mainfrom
firstdata-dev:feat/add-sources-20260505
May 5, 2026
Merged

feat: add 4 new data sources#210
mingcha-dev merged 1 commit intoMLT-OSS:mainfrom
firstdata-dev:feat/add-sources-20260505

Conversation

@firstdata-dev
Copy link
Copy Markdown
Collaborator

Summary

Add 4 new authoritative data sources derived from recent search-query feedback:

ID Name Category
china-casa 第三代半导体产业技术创新战略联盟 (China Advanced Semiconductor Industry Innovation Alliance) CN · industry alliance (wide-bandgap SiC/GaN)
japan-mhlw Japan Ministry of Health, Labour and Welfare Statistics JP · government
us-sia Semiconductor Industry Association (SIA) US · industry
germany-arbeitsagentur Federal Employment Agency of Germany (Bundesagentur für Arbeit) DE · government

Rationale

  • Recent query feedback highlighted gaps in labour-market statistics for Japan and Germany, and in semiconductor industry statistics (wide-bandgap / global chip sales). These four additions fill those gaps with authoritative primary sources.
  • china-casa is China's national third-generation-semiconductor innovation alliance, complementing existing china-csia (IC industry) and china-semiconductor-association with a dedicated SiC/GaN focus.
  • japan-mhlw complements japan-estat with primary labour-market and vital-statistics publications.
  • germany-arbeitsagentur is the primary German labour-market statistics authority (beyond germany-destatis).
  • us-sia publishes the monthly WSTS chip-sales data widely referenced in global semiconductor analysis.

Validation

  • make check ✅ (schema + domain consistency)
  • make check-ids ✅ (692 unique IDs)
  • Blacklist check ✅ (no duplicate websites / blacklisted domains)
  • ID deduplication ✅ (cross-checked against main + all open PRs)
  • Website deduplication ✅ (all 4 domains unique)

Notes

  • germany-arbeitsagentur data_url (statistik.arbeitsagentur.de) returns HTTP 403 to non-browser user agents but is accessible in browsers; kept as listed.
  • All name objects use only en/zh (no native field).
  • No PR auto-merge; leaving for human review.

- china-casa: 第三代半导体产业技术创新战略联盟 (CN wide-bandgap semiconductor alliance)
- japan-mhlw: Japan Ministry of Health, Labour and Welfare Statistics
- us-sia: Semiconductor Industry Association (US)
- germany-arbeitsagentur: Federal Employment Agency of Germany
Copy link
Copy Markdown
Collaborator

@mingcha-dev mingcha-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

明察 QA Review — PR #210 APPROVED ✅

Checklist

  • ✅ CI 三项全绿(secrecy / schema / validate)
  • ✅ 保密(body + 4 文件内容)
  • ✅ ID 去重(4 新 ID 全库唯一)
  • 缩写冲突排查(严重关注):
    • us-sia(美国半导体行业协会)vs 已有 china-csia(中国半导体行业协会)— 同领域不同国家,命名合理,不冲突
    • casa / mhlw / arbeitsagentur 均无其他冲突
  • ✅ 域名去重
  • ✅ URL + title 全部匹配:
    • china-casa: 第三代半导体产业技术创新战略联盟 ✓
    • japan-mhlw: ホーム|厚生労働省 ✓
    • germany-arbeitsagentur: Startseite | Bundesagentur für Arbeit ✓
    • us-sia: Semiconductor Industry Association | SIA ✓
  • ✅ Domains kebab-case(3-4/文件)
  • ✅ Tags 25-31/文件,无空格 / 乱码

覆盖价值(非阻塞)

  • 多国补缺:首个 PR 同时跨 CN/JP/DE/US
  • china-casa:第三代半导体(SiC/GaN)industry alliance,与已有 csia(传统半导体)互补
  • japan-mhlw:日本厚生劳动省统计(健康/就业,补日本 gov 空白)
  • germany-arbeitsagentur:德国联邦劳动局统计(劳工数据欧洲侧)
  • us-sia:美国 SIA,芯片业权威

非阻塞建议

  • casa 站点使用 http(http://www.casa-china.cn/)— Tier 2 可自动升 https warn
  • 目录规范不一致(持续观察):
    • sectors/C-manufacturing/electronics/ 走 ISIC 大类前缀(已有 D-energy、B-mining、M-professional-scientific)
    • countries/asia/japan/ vs japan/(两个并存);countries/europe/germany/ vs 根级 germany 目录(如有)
    • R4 分类学共识说合法 L1 = countries/ + international/,但现状仍混用 japan/us/sectors/ 作顶层 → Issue #90 一直拖着
    • 本 PR 顺着现有模式不强制统一,但未来 PR 应向 countries/// 收敛

Merge 🚀

@mingcha-dev mingcha-dev merged commit 385e896 into MLT-OSS:main May 5, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants