Skip to content

fix(opensearch): use direct HEAD check in indexExists(), guard null cih in clearIndex()#35397

Closed
fabrizzio-dotCMS wants to merge 5 commits intomainfrom
fix/issue-35303-clear-index-400
Closed

fix(opensearch): use direct HEAD check in indexExists(), guard null cih in clearIndex()#35397
fabrizzio-dotCMS wants to merge 5 commits intomainfrom
fix/issue-35303-clear-index-400

Conversation

@fabrizzio-dotCMS
Copy link
Copy Markdown
Member

@fabrizzio-dotCMS fabrizzio-dotCMS commented Apr 21, 2026

Summary

  • Replace listIndices()-based indexExists() with a DB-first check via IndiciesAPI.loadIndicies() — vendor-neutral, no ES/OS client dependency
  • Normalize index name to logical form before getClusterHealth() lookup in clearIndex() to prevent NPE when a physical name is passed
  • Add null-guard on cih with a clear DotStateException message
  • Fix integration test that checked clusterIndexName twice instead of covering both logical and physical name variants

Root cause

ESIndexAPI.indexExists() used listIndices() which builds an ES wildcard query from IndexType.getPattern(). That pattern embeds ClusterFactory.getClusterId(), which reads from the dot_cluster DB table first. When the DB cluster ID diverges from DOT_DOTCMS_CLUSTER_ID (e.g. stale DB row, or the env var was added after first startup), the pattern matches no indices. getIndices() catches the empty result and swallows it, so listIndices() silently returns an empty set → indexExists() always returns falseclearIndex() throws HTTP 400 before the operation executes.

Fix

Query the indicies DB table first via APILocator.getIndiciesAPI().loadIndicies(). The stored physical names carry the cluster prefix that was real at activation time — immune to any DB/env-var cluster-ID mismatch. Falls back to the existing listIndices() ES scan for indices not yet tracked in the DB (bootstrap, direct createIndex calls, integration tests).

Test plan

  • Build: ./mvnw install -pl :dotcms-core -DskipTests
  • Integration test: ESIndexAPITest#index_exists_should_resolve_even_with_cluster_id
  • QA: PUT /api/v1/esindex/{indexName}?action=clear with DOT_DOTCMS_CLUSTER_ID set → expect HTTP 200 and index recreated

Closes #35303

🤖 Generated with Claude Code

…ih in clearIndex() (#35303)

ESIndexAPI.indexExists() relied on pattern-based listIndices() which can silently return an empty
set when DOTCMS_CLUSTER_ID is configured and there is any cluster-ID mismatch or transient ES error,
causing clearIndex() to always throw 400 "Index does not exist". Replace with a single direct
indices().exists() HEAD request that resolves both logical and physical names via
getNameWithClusterIDPrefix(). Also normalize the index name to its logical form before the
getClusterHealth() map lookup in clearIndex() to prevent NPE when a physical name is passed.

Refs: #35303

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous fix used RestHighLevelClientProvider.getInstance().getClient()
.indices().exists() — a vendor-specific ES call that shouldn't be added
during the ES→OS migration.

Root cause of the original failure:
ClusterFactory.getClusterId() prioritises the dot_cluster DB row. When the
DB cluster_id diverges from DOT_DOTCMS_CLUSTER_ID (e.g. stale DB row or
env var introduced after first startup), IndexType.getPattern() constructs
a wildcard that matches no ES indices. The exception inside getIndices() is
caught and swallowed, so listIndices() silently returns an empty set and
indexExists() always returns false.

Fix: query the DB-backed indicies table first (via
APILocator.getIndiciesAPI().loadIndicies()). The stored physical names carry
the cluster prefix that was real at activation time, so the comparison is
immune to any DB/env-var cluster-ID mismatch. Fall back to the existing
listIndices() ES scan for indices not yet tracked in the DB (bootstrap,
direct createIndex calls, integration tests).

Keeps the clearIndex() guards from the previous commit:
- normalise to logical name before getClusterHealth() lookup
- null-guard on cih with a clear DotStateException

Refs: #35303

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dotCMS dotCMS deleted a comment from claude Bot Apr 21, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 21, 2026

Claude finished @fabrizzio-dotCMS's task in 2m 48s —— View job


PR Review

  • Gather context
  • Read changed files
  • Analyze implementation
  • Post review

The root-cause analysis and overall approach are sound. Four issues worth addressing:


1. Double listIndices() in Phase 2 — two ES round trips (ESIndexAPI.java:426)

// current — hits ES twice
return listIndices().contains(stripped) || listIndices().contains(indexName.toLowerCase());

listIndices() performs a full ES cluster scan each time. Should be one call:

final Set<String> all = listIndices();
return all.contains(stripped) || all.contains(indexName.toLowerCase());

Fix this →


2. Phase 2 second condition is dead code (ESIndexAPI.java:426)

listIndices() returns logical names — the cluster prefix is always stripped in getIndices(). So listIndices().contains(indexName.toLowerCase()) can never be true when indexName is a physical name (it would need to contain the prefix). When indexName is already logical, stripped == indexName.toLowerCase() and the second condition duplicates the first. This looks like a carry-over from the old two-check pattern that no longer makes sense after introducing stripped. Worth removing or documenting why it's needed.


3. Missing .toLowerCase() in clearIndex() logical-name lookup (ESIndexAPI.java:468)

final String logicalName = removeClusterIdFromName(indexName);  // no toLowerCase()
final ClusterIndexHealth cih = getClusterHealth().get(logicalName);

getClusterHealth() keys are lowercase logical names (ES enforces lowercase; the map uses removeClusterIdFromName(x.getKey())). If indexName arrives with any uppercase characters, the lookup silently misses, cih is null, and the new guard throws a 400 with the misleading message "health data is unavailable" — even though the index exists. indexExists() gets this right at line 407 with indexName.toLowerCase(). Should be:

final String logicalName = removeClusterIdFromName(indexName.toLowerCase());

Fix this →


4. Test covers Phase 2 (ES scan) only, not the Phase 1 (DB) path (ESIndexAPITest.java:73)

createIndex() does not register the index in the indicies DB table — only activateIndex() does. So indexExists() in this test always falls through to Phase 2. The cluster-ID mismatch scenario described in the root cause (DB record exists but cluster ID diverges from env var) is untested. The test name index_exists_should_resolve_even_with_cluster_id implies otherwise. Not a blocker, but the test doesn't actually exercise the fixed code path.


Minor — pre-existing issue, not introduced here (ESIndexAPI.java:613): updateReplicas() calls getClusterHealth().get(getNameWithClusterIDPrefix(indexName)), but getClusterHealth() maps by logical names (it strips the prefix in the collector). The lookup always misses and health is always null, silently skipping the replica update. Unrelated to this PR but in the same area.

@fabrizzio-dotCMS fabrizzio-dotCMS marked this pull request as ready for review April 21, 2026 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[BUG] ESIndexAPI.clearIndex() always returns 400 when DOTCMS_CLUSTER_ID is configured

1 participant