[fix][client] Prevent duplicate ServiceUrlProvider initialization#25899
Conversation
There was a problem hiding this comment.
I think this needs a lifecycle fix. If the same ServiceUrlProvider instance is reused to build a second client, the second build now fails with IllegalStateException, but the constructor failure path calls shutdown(), which unconditionally closes conf.getServiceUrlProvider(). That can close the provider still used by the first live client.
Example:
ServiceUrlProvider provider = AutoClusterFailover.builder()
.primary(primary)
.secondary(List.of(secondary))
.failoverDelay(1, TimeUnit.SECONDS)
.switchBackDelay(1, TimeUnit.SECONDS)
.build();
PulsarClient client1 = PulsarClient.builder()
.serviceUrlProvider(provider)
.build();
PulsarClient.builder()
.serviceUrlProvider(provider)
.build(); // fails, then closes provider used by client1
Could we only close the provider on constructor failure if this PulsarClientImpl successfully initialized it?
@void-ptr974 Nice catch. I agree with this approach. @lhotari WDYT? |
@oneby-wang It would be an improvement to handle this case. In addition, the javadoc of |
|
/pulsarbot rerun-failure-checks |
@lhotari I see, let's get this merged first. I'll create another PR to address the improvement. |
@oneby-wang I think that the javadoc improvement belongs to this PR. This PR makes the implicit contract of the ServiceUrlProvider interface explicit and it's useful to document that in javadoc. |
@lhotari Addressed. |
Motivation
#25892 fixed a flaky
SameAuthParamsLookupAutoClusterFailoverTestby removing an extra manualfailover.initialize(client)call from the test.The root cause of that flakiness was duplicate initialization.
PulsarClientBuilder.build()already initializes the configuredServiceUrlProviderthroughPulsarClientImpl, so callinginitialize(client)again starts duplicate background checks for the same provider instance.This is especially problematic for
SameAuthParamsLookupAutoClusterFailover, because eachinitializecall creates a newbroker-service-url-checkEventLoopGroup. Multiple checker threads can then mutate the same failover state and produce subtle race conditions that are difficult to diagnose.AutoClusterFailoverandControlledClusterFailoverhave the same lifecycle risk: duplicate initialization can register duplicate scheduled tasks, andControlledClusterFailovercan also recreate its HTTP client without closing the previous one.Modifications
Verifying this change
Does this pull request potentially affect one of the following parts:
The threading model is affected only by preventing duplicate background failover check tasks from being registered for the same ServiceUrlProvider instance.