Skip to content

feat(rum): scope latest-metrics RUM to locale path prefix#1707

Open
alinarublea wants to merge 5 commits into
mainfrom
feature/rum-locale-path-filtering
Open

feat(rum): scope latest-metrics RUM to locale path prefix#1707
alinarublea wants to merge 5 commits into
mainfrom
feature/rum-locale-path-filtering

Conversation

@alinarublea

@alinarublea alinarublea commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Why

Locale-specific sites whose locale lives in the URL path (e.g. https://example.com/de) have no RUM domain key of their own — in the RUM bundler, a domain key is issued per hostname, and a path-locale shares the apex hostname. Today wwwUrlResolver strips the path and the totalMetrics query aggregates RUM for the entire apex domain, so a path-locale site's latest-metrics reports domain-wide numbers instead of its own subtree.

What

Result-level path scoping for totalMetrics, opt-in via a new pathPrefix option:

  • spacecat-shared-utils — new getBaseURLPathPrefix(baseURL) helper: extracts the locale path (https://example.com/de/de), returns null at the domain root, for a file-pointing baseURL (see below), or for unparseable input.
  • spacecat-shared-rum-api-client — the totalMetrics handler now reads opts.pathPrefix and filters bundles to that subtree before aggregating. Matching is boundary-safe: /de matches /de and /de/... but not /design.

With no pathPrefix, behavior is byte-for-byte identical to today — no existing caller is affected.

Verified against live RUM data

Pulled real bundles with the success-studio RUM admin key:

Target baseline PV (30d) scoped to /jp kept
www.adobe.com 242,339,200 9,494,200 3.92 %

Bundle URLs confirmed carrying locale paths (/jp, /de, /uk, /in, /br…), so path filtering selects the correct subset. The www-vs-bare wrinkle (adobe.com returns 0 PV, www.adobe.com returns the data) is already handled by the existing resolveWwwUrl, which probes hasBundleData and resolves to the variant that has data — pathPrefix rides alongside the resolved domain.

Page-path baseURLs (the second commit)

Some sites register their baseURL as a landing page, not a directory — e.g. https://www2.deloitte.com/us/en.html. Returning the full path as the prefix (/us/en.html) matches no bundles and scopes RUM to ~0 (a 700 PV/30d site dropped to 0 in testing).

The locale directory cannot be reliably inferred from a file path: in /us/en.html the en is a locale, but in /en/home.html the home is a page — the two are syntactically identical. So when the last path segment has an extension, getBaseURLPathPrefix returns null and the caller falls back to whole-domain metrics (today's behavior) rather than over-filtering. Clean locale directories (/jp, /en-gb) still filter normally.

Why only path-locales, not subdomains

Subdomain-based locales (de.example.com, business.adobe.com) are a different hostname, so the RUM bundler issues them their own domain key and their own bundles — confirmed live (business.adobe.comkey: YES, 4.59M PV of its own). They already resolve and scope correctly with no filtering, and getBaseURLPathPrefix returns null for them. No change needed; no regression. This PR is specifically the path-locale fix.

How it fits together

domain stays the bare hostname (required by the bundler URL and for domain-key resolution on the apex). The locale scoping happens purely at the result level. The consuming side — spacecat-api-service's getLatestSiteMetrics extracting the prefix from site.getBaseURL() and passing it through — lands in a follow-up PR (this shared release must publish first; see companion below).

The prefix-matching logic is intentionally kept self-contained in rum-api-client rather than importing the new util, to avoid a cross-package version bump (rum-api-client pins an exact spacecat-shared-utils version).

Companion PR

Tests

  • getBaseURLPathPrefix: root → null, trailing slash, single/multi-segment, schema-less, unparseable, null/undefined, file-pointing paths → null (/us/en.html, /en/home.html, /index.php), extensionless deep path retained, hyphenated-locale not treated as a file. url-helpers.js at 100% coverage.
  • totalMetrics: aggregates all without pathPrefix (regression guard); scopes to subtree with it; rejects look-alike sibling paths; drops bundles with missing/invalid URLs.
  • Both package suites green; coverage gates pass.

🤖 Generated with Claude Code

Locale-specific sites (e.g. https://example.com/de) have no RUM domain
key of their own — keys exist only for the main domain. Today the
totalMetrics query resolves the bare hostname and aggregates RUM for the
entire parent domain, so locale sites report domain-wide numbers instead
of their own subtree.

Add result-level path scoping:
- spacecat-shared-utils: getBaseURLPathPrefix(baseURL) extracts the
  locale path (e.g. '/de'), or null at the domain root.
- spacecat-shared-rum-api-client: the totalMetrics handler now reads
  opts.pathPrefix and filters bundles to that subtree before aggregating,
  with boundary-safe matching ('/de' does not match '/design').

Opt-in: with no pathPrefix the handler behaves exactly as before.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@MysticatBot MysticatBot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @alinarublea,

⚠ Degraded review - no spec document was found for this change (searched the PR links, the touched repos' docs, the architecture/guidelines docs, and linked Jira). This review covers code-level quality but could not validate the change against an agreed design, so confidence is reduced. Add a spec link (PR template section 4) and re-request review for a full-confidence pass.

Verdict: Approve - clean, well-scoped feature with solid test coverage.
Complexity: LOW - small diff, single package surface.
Changes: Adds opt-in locale path-prefix filtering to the totalMetrics RUM handler plus a getBaseURLPathPrefix utility for extracting the path prefix from a site's baseURL (6 files).

Non-blocking (3): minor issues and suggestions
  • nit: matchesPathPrefix does not normalize prefix - if a caller passes /de/ (trailing slash), the boundary check path.startsWith('/de//') silently fails to match nested paths. A defensive prefix.replace(/\/+$/, '') would make this robust against future callers that skip normalization - packages/spacecat-shared-rum-api-client/src/functions/total-metrics.js:22
  • nit: Silent catch blocks in both matchesPathPrefix and getBaseURLPathPrefix provide no observability signal when bundles or URLs are dropped due to parse failure. For a data-aggregation pipeline, systematically malformed URLs would silently degrade metrics with no alert - packages/spacecat-shared-rum-api-client/src/functions/total-metrics.js:23
  • suggestion: Add a test for empty-string pathPrefix to lock in the falsy-means-no-filter contract, since '' could arrive from upstream coercion of a null return - packages/spacecat-shared-rum-api-client/test/total-metrics.test.js:79

Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 8m 37s | Cost: $3.58 | Commit: 851176e272acc7e92d556b4533e2a1a80791e94a
If this code review was useful, please react with 👍. Otherwise, react with 👎.

@MysticatBot MysticatBot added ai-reviewed Reviewed by AI complexity:low AI-assessed PR complexity: LOW labels Jun 23, 2026
@github-actions

Copy link
Copy Markdown

This PR will trigger a minor release when merged.

alinarublea and others added 2 commits June 25, 2026 11:51
getBaseURLPathPrefix returned the full path for any non-root baseURL, so a
site registered at a landing page (e.g. https://example.com/us/en.html)
yielded pathPrefix=/us/en.html, which matches no RUM bundles and scopes
metrics to ~0 (verified: a 700 PV/30d site dropped to 0).

The locale directory cannot be reliably inferred from a file path —
'en.html' is a locale but 'home.html' in '/en/home.html' is a page, and the
two are syntactically identical. So when the last path segment has an
extension, return null and fall back to whole-domain metrics (today's
behavior) rather than over-filtering. Clean locale directories like /jp and
/en-gb still filter normally.

Verified against live RUM: www.adobe.com 242.3M PV -> /jp scoped 9.49M (3.92%).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
alinarublea added a commit to adobe/spacecat-api-service that referenced this pull request Jun 25, 2026
1.121.0 is already published by an unrelated change and does NOT contain
getBaseURLPathPrefix; the helper lands in the next utils release after
adobe/spacecat-shared#1707 merges (expected 1.122.0). Still a prediction —
reconcile both shared pins against the actually-published versions once
#1707 releases, then take this PR out of draft.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alinarublea alinarublea requested a review from MysticatBot June 25, 2026 08:57

@MysticatBot MysticatBot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @alinarublea,

⚠ Degraded review - no spec document was found for this change (searched the PR links, the touched repos' docs, the architecture/guidelines docs, and linked Jira). This review covers code-level quality but could not validate the change against an agreed design, so confidence is reduced. Add a spec link (PR template section 4) and re-request review for a full-confidence pass.

Verdict: Approve - focused improvement with solid test coverage, no blocking issues.
Complexity: MEDIUM - 165 lines across 6 files in two packages.
Changes: Adds opt-in locale path-prefix filtering to totalMetrics and a getBaseURLPathPrefix utility with file-extension detection to avoid over-filtering on file-like URLs (6 files).

Non-blocking (3): minor issues and suggestions
  • nit: The file-extension regex /\.[a-z0-9]+$/i matches single-char extensions, so a versioned path like /v1.2 has last segment v1.2 where .2 matches - returning null instead of /v1.2. Unlikely for locale baseURLs but worth a test documenting the intended behavior - packages/spacecat-shared-utils/src/url-helpers.js:329
  • suggestion: Add a test for empty-string pathPrefix (handler(bundles, { pathPrefix: '' })) to lock in the falsy-means-no-filter contract against future refactors - packages/spacecat-shared-rum-api-client/test/total-metrics.test.js:79
  • suggestion: Add a test confirming query strings and fragments in baseURL are ignored (e.g. https://example.com/de?lang=de returns /de) to guard the URL-constructor behavior against regressions - packages/spacecat-shared-utils/test/url-helpers.test.js:737

Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 1m 45s | Cost: $3.26 | Commit: f89c44dbaf9d831c1bdbd8653141dbe4579d37a4
If this code review was useful, please react with 👍. Otherwise, react with 👎.

@MysticatBot MysticatBot added complexity:medium AI-assessed PR complexity: MEDIUM and removed complexity:low AI-assessed PR complexity: LOW labels Jun 25, 2026
alinarublea added a commit to adobe/spacecat-api-service that referenced this pull request Jun 25, 2026
Resolve package.json by keeping the currently-published shared dependency
versions (spacecat-shared-utils 1.119.2, rum-api-client 2.40.13) rather than
the unpublished 1.122.0/2.44.0 — so this PR builds and passes CI on existing
deps without waiting on the spacecat-shared release.

Because published spacecat-shared-utils does not yet contain
getBaseURLPathPrefix, inline a local copy in src/support/utils.js (with the
file-path fallback) and import it there instead of from the shared package.
Replace with the shared import once adobe/spacecat-shared#1707 publishes and
the dependency is bumped.

Note: result-level bundle filtering lives in rum-api-client (the #1707
change), so on the current published rum-api-client, passing pathPrefix is a
forward-compatible no-op; it activates once that dependency is bumped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-reviewed Reviewed by AI complexity:medium AI-assessed PR complexity: MEDIUM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants