Skip to content

feat(gax-grpc): add configurable resize delta and warning for repeated resizing#12838

Open
lqiu96 wants to merge 23 commits intomainfrom
feat/channelpool-resizing
Open

feat(gax-grpc): add configurable resize delta and warning for repeated resizing#12838
lqiu96 wants to merge 23 commits intomainfrom
feat/channelpool-resizing

Conversation

@lqiu96
Copy link
Copy Markdown
Member

@lqiu96 lqiu96 commented Apr 17, 2026

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable maxResizeDelta for the ChannelPool and adds a warning log when the pool resizes consecutively for five cycles. Feedback highlights a logic error in the resizing detection that could lead to false-positive warnings when the pool is at its limits; a suggestion is provided to track actual size changes instead. Additionally, the removal of test exclusions in the pom.xml was flagged as a potential cause for CI failures.

I am having trouble creating individual review comments. Click here to see my feedback.

sdk-platform-java/gax-java/gax-grpc/src/main/java/com/google/api/gax/grpc/ChannelPool.java (322-327)

high

The logic for determining if the pool is 'resizing' is flawed. By checking if the current size is outside the load-based bounds (minChannels and maxChannels), the counter will increment even when the pool is at its hard limits (e.g., at minChannelCount while idle). This will lead to false positive warnings after 5 idle cycles because minChannels will be 0 while the pool is correctly clamped at its minimum size (e.g., 1).

Instead, you should track whether the pool is actually attempting to change its size by comparing dampenedTarget with currentSize. This correctly identifies when the pool is either oscillating or slowly growing/shrinking due to the maxResizeDelta cap.

    boolean resized = (dampenedTarget != currentSize);
    if (resized) {
      consecutiveResizes++;
    } else {
      consecutiveResizes = 0;
    }

sdk-platform-java/gax-java/gax-grpc/pom.xml (145-149)

medium

The removal of the maven-surefire-plugin configuration seems unintended. This block was used to exclude specific tests that require environment variables to be set. Removing it will likely cause these tests to run and fail in standard CI environments. If this change was intentional, please provide justification; otherwise, it should be reverted.

@lqiu96 lqiu96 force-pushed the feat/channelpool-resizing branch from b17aa56 to 827b22d Compare April 17, 2026 21:09
@lqiu96
Copy link
Copy Markdown
Member Author

lqiu96 commented Apr 17, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable maxResizeDelta for the ChannelPool in gax-grpc and implements a warning mechanism for repeated resizing cycles. Feedback focuses on ensuring thread-safety for the resize counter, optimizing redundant size checks, and restoring accidentally deleted metadata and documentation links in the java-iam-policy module.

Comment thread java-iam-policy/.repo-metadata.json
Comment thread java-iam-policy/README.md
@lqiu96
Copy link
Copy Markdown
Member Author

lqiu96 commented Apr 17, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable maxResizeDelta in ChannelPoolSettings, adds logic to ChannelPool to track consecutive resizes and log warnings, and removes certain metadata and documentation links. Feedback indicates that the removal of the enable-api link in the README appears accidental and should be restored. Additionally, a stale comment in ChannelPool.java incorrectly describing the resize() method as synchronized should be removed.

Comment thread java-iam-policy/README.md
@lqiu96 lqiu96 requested a review from jinseopkim0 April 20, 2026 17:06
@lqiu96 lqiu96 marked this pull request as ready for review April 20, 2026 17:06
@lqiu96 lqiu96 requested a review from a team as a code owner April 20, 2026 17:06
class ChannelPool extends ManagedChannel {
static final String CHANNEL_POOL_CONSECUTIVE_RESIZING_WARNING =
"Channel pool is repeatedly resizing. "
+ "Consider adjusting `initialChannelCount` or `maxResizeDelta` to a more reasonable value. "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would it make sense to also add max/minRpcsPerChannel in the message? E.g.

"Consider adjusting `initialChannelCount`, `maxResizeDelta`, `minRpcsPerChannel`, or `maxRpcsPerChannel` to a more reasonable value. "

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan is to change this to reference a public guide that I'm going to write (after metrics). WDYT if I leave this for now (for the immediate Datastore ask) and then update it to the public guide?

if (Math.abs(delta) > ChannelPoolSettings.MAX_RESIZE_DELTA) {
dampenedTarget =
currentSize + (int) Math.copySign(ChannelPoolSettings.MAX_RESIZE_DELTA, delta);
if (Math.abs(delta) > settings.getMaxResizeDelta()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know why there was a limit in the first place? Were there any technical limitations?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, it looks to just be a choice. Dampening and rate limit the channel growth to not overwhelm the client for a sudden burst.

.setMaxChannelCount(size)
// Static pools don't resize so this value doesn't affect operation. However,
// validation still checks that resize delta doesn't exceed channel pool size.
.setMaxResizeDelta(Math.min(DEFAULT_MAX_RESIZE_DELTA, size))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still give it a max limit instead of being unbounded. Otherwise customers might expect this to handle 100x request spike as well.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delta is capped to between [1, MAX_CHANNEL_COUNT]. The javadocs already mention that the number of channels can never exceed the total number of channels configured (Default 200)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is 200 a realistic number? I think it makes sense to allow customers configure resizeDelta to be more than 2, but more than 10 (We need to come up with an more acurate number) may introduce other performance issues.

For example

  • resize is in a synchronized block.
  • We use AtomicInteger to keep tracking the number of outstanding RPCs, which could cause issues in high contention scenarios. In which case LongAdder may be preferred.

Can we reach out to the gRPC team and get some suggestions?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not intend to fix channel pool's issues. It only exposing a setting to allow users to configure a value that they want that fits within the current bounds of the existing logic. The default resize delta remains 2 and if they choose a different value, then they can test it and figure if it works for their workload.

There are limitations to the existing channel pool logic that can use some broader changes. There is a project proposal to investigate (b/503856499) how to make this better overall.

But is 200 a realistic number? I think it makes sense to allow customers configure resizeDelta to be more than 2, but more than 10 (We need to come up with an more acurate number) may introduce other performance issues.

I just aim provide a safe default and let customers tinker with that they think best. Delta of 10 or 25 may work better for different workloads.

If we want to fix ChannelPooling, then I think other changes would be more beneficial than investigating the resize delta value.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this PR is not to fix ChannelPooling. However, the new setter could make it easier for customers to exploit the current limitations of ChannelPooling, hence I think it would still be better to set an upper limit.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exploit the current limitations of ChannelPooling

I'm not sure what you mean by exploiting here. If they configure a high value where performance degrades and doesn't work for them, they can rollback this change. If a high delta works for their use case, then they can opt to keep it.


Regardless, I think we may have to agree to disagree on setting a hard bound here. I don't think we should set an arbitrary bound for user configuration regardless of current limitations, barring something that doesn't fit the logic like negative resize delta.

IMO, if it has drastic performance concerns, it would be beneficial to see user workload configurations as well as the issue reports. It gives us signal about their requirements and helps us see what would be needed for a future ChannelPool overhaul (e.g. channel priming, power-of-two, etc). And the channelpool issues give us more direct data point to prioritize the project (instead of slightly tangential reports of increased client-side request latency). I'm worried that limiting this "exploit" hides the need to fix the underlying channelpool issues.

Rather than having to talk to gRPC and having to investigate the possible default upper bound limits, how we compromise and I set this to a higher default upper bound value (e.g. 25?). I'll add javadocs about potential performance concerns for setting a higher delta?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to clamp to 25 max for the resize delta with warning about performance

@lqiu96 lqiu96 requested review from blakeli0 and jinseopkim0 April 20, 2026 23:08
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed for 'gapic-generator-java-root'

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed for 'gapic-generator-java-root'

Failed conditions
33.3% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants