fix(ingestion): refresh RDS IAM auth token per connection for MySQL#28730
Conversation
RDS IAM auth tokens expire after ~15 minutes. The token was generated once in get_connection_url_common and baked into the SQLAlchemy URL, so a single engine reused one frozen token for every pooled connection and never refreshed it. With max_overflow=-1 and multi-threaded extraction, any connection opened after the 15-minute TTL — common on large catalogs and during profiling — authenticated with a stale token and failed mid run with "Access denied". Add RdsIamAuthTokenManager (caches the token, parses its expiry from the presigned-URL params, refreshes before it lapses) and wire a do_connect event listener in the MySQL connection handler that injects a fresh token on every new connection instead of embedding it in the URL. SSL is forced on for IAM (PyMySQL requires it) while preserving any existing SSL config. Scoped to the MySQL connector. The shared builders.py IAM path is still token-frozen for the other RDS connectors (Postgres, Redshift, Greenplum, Timescale); the reusable token manager lives in aws_client.py so they can adopt it next. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
✅ PR checks passedThe linked issue has a description and all required Shipping project fields set. Thanks! |
There was a problem hiding this comment.
Pull request overview
This PR fixes time-dependent failures in MySQL ingestion against AWS RDS using IAM authentication by ensuring an RDS IAM auth token is refreshed for each new SQLAlchemy connection (instead of being generated once and embedded into the engine URL).
Changes:
- Added
RdsIamAuthTokenManagerto cache and refresh RDS IAM tokens based on presigned-token expiry. - Updated MySQL connection handling to build an IAM-specific engine and inject a fresh token via a SQLAlchemy
do_connectlistener (and enable SSL for PyMySQL). - Added unit tests covering MySQL IAM engine wiring and token-manager refresh behavior, plus characterization tests documenting the remaining shared
builders.pygap.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| ingestion/src/metadata/ingestion/source/database/mysql/connection.py | Adds IAM-specific engine path with do_connect listener to inject fresh IAM token per connection. |
| ingestion/src/metadata/clients/aws_client.py | Introduces RdsIamAuthTokenManager that parses token expiry and refreshes before expiry. |
| ingestion/tests/unit/source/database/test_mysql_iam.py | New unit tests validating MySQL IAM engine URL behavior, listener registration, token injection, and SSL handling. |
| ingestion/tests/unit/connections/test_iam_token_refresh.py | Characterization tests documenting current shared IAM URL behavior (token baked into URL) as a follow-up gap. |
🟡 Playwright Results — all passed (17 flaky)✅ 4256 passed · ❌ 0 failed · 🟡 17 flaky · ⏭️ 88 skipped
🟡 17 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
The do_connect listener set cparams["ssl"] = {} when no SSL config was
present. An empty dict is falsy, so PyMySQL only enables TLS in PREFERRED
mode (self.ssl=True, _ssl_required=False), which silently falls back to
plaintext if the server doesn't offer TLS. RDS IAM auth mandates TLS, so
this is the wrong guarantee.
Inject {"check_hostname": True} instead: a truthy dict makes PyMySQL treat
SSL as required (_ssl_required=True) and verifies the RDS server cert.
Explicitly provided ssl config is still preserved.
Update test_listener_enables_ssl_required_by_pymysql_for_iam to assert a
truthy value, and add test_injected_ssl_makes_pymysql_require_tls which
drives the value through real PyMySQL and asserts conn.ssl is True and
conn._ssl_required is True, so the test verifies actual TLS-required
behavior rather than the mock value.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
8d21f85 to
8e43bc5
Compare
Code Review ✅ Approved 3 resolved / 3 findingsImplements a per-connection RDS IAM token refresh mechanism for MySQL to prevent authentication timeouts during long ingestion runs. Resolves thread-safety concerns, connection string parsing issues, and TLS configuration gaps. ✅ 3 resolved✅ Bug: RdsIamAuthTokenManager token cache is not thread-safe
✅ Edge Case: hostPort split and unencoded username are fragile in _get_iam_engine
✅ Bug: Empty ssl dict disables TLS, breaking RDS IAM auth
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
…28730) * fix(ingestion): refresh RDS IAM auth token per connection for MySQL RDS IAM auth tokens expire after ~15 minutes. The token was generated once in get_connection_url_common and baked into the SQLAlchemy URL, so a single engine reused one frozen token for every pooled connection and never refreshed it. With max_overflow=-1 and multi-threaded extraction, any connection opened after the 15-minute TTL — common on large catalogs and during profiling — authenticated with a stale token and failed mid run with "Access denied". Add RdsIamAuthTokenManager (caches the token, parses its expiry from the presigned-URL params, refreshes before it lapses) and wire a do_connect event listener in the MySQL connection handler that injects a fresh token on every new connection instead of embedding it in the URL. SSL is forced on for IAM (PyMySQL requires it) while preserving any existing SSL config. Scoped to the MySQL connector. The shared builders.py IAM path is still token-frozen for the other RDS connectors (Postgres, Redshift, Greenplum, Timescale); the reusable token manager lives in aws_client.py so they can adopt it next. * fix: Thread-safety in RdsIamAuthTokenManager, username not URL-encoded * fix: base_url drops databaseSchema + connectionOptions * fix(ingestion): enforce required TLS for MySQL RDS IAM connections The do_connect listener set cparams["ssl"] = {} when no SSL config was present. An empty dict is falsy, so PyMySQL only enables TLS in PREFERRED mode (self.ssl=True, _ssl_required=False), which silently falls back to plaintext if the server doesn't offer TLS. RDS IAM auth mandates TLS, so this is the wrong guarantee. Inject {"check_hostname": True} instead: a truthy dict makes PyMySQL treat SSL as required (_ssl_required=True) and verifies the RDS server cert. Explicitly provided ssl config is still preserved. Update test_listener_enables_ssl_required_by_pymysql_for_iam to assert a truthy value, and add test_injected_ssl_makes_pymysql_require_tls which drives the value through real PyMySQL and asserts conn.ssl is True and conn._ssl_required is True, so the test verifies actual TLS-required behavior rather than the mock value. --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>



Problem
Fix: #28686
MySQL ingestion against AWS RDS using IAM authentication fails partway
through long runs with
Access denied.The IAM auth token is generated once, at engine-build time, and baked into
the SQLAlchemy connection URL (
get_password_secret→get_connection_url_commonin
builders.py). RDS IAM tokens expire after ~15 minutes, and the token isnever refreshed for the lifetime of the engine. Because the engine is created with
max_overflow=-1(unlimited overflow connections) and ingestion is multi-threaded,any new pooled connection opened after the token expires — large schemas, the
profiler, reconnects — authenticates with a stale credential and dies mid-run.
The failure is time-dependent and intermittent: a small/single-connection run can
reuse one still-valid connection and pass, which masks the bug; a large catalog
reliably hits it.
Fix
clients/aws_client.py— addRdsIamAuthTokenManager: generates the RDS IAMtoken, derives expiry from the presigned-URL params (
X-Amz-Date/X-Amz-Expires),caches it, and regenerates shortly before expiry. Falls back to a conservative TTL
if the token can't be parsed so it still refreshes rather than living forever.
source/database/mysql/connection.py— routeIamAuthConfigurationSourceto anew
_get_iam_enginethat:scheme://user@host:port), anddo_connectevent listener that injects a fresh token onevery connection and enables SSL (PyMySQL requires SSL for RDS IAM), preserving
any existing SSL config.
This keeps the connector-specific wiring in the MySQL connection handler while the
reusable token manager lives with
AWSClient.Scope / follow-up
Scoped to MySQL. The shared
builders.pyIAM path remains token-frozen for theother RDS connectors that go through it (Postgres, Redshift, Greenplum, Timescale).
tests/unit/connections/test_iam_token_refresh.pydocuments that remaining gap and isset up to flip to green once the shared path adopts
RdsIamAuthTokenManager.Tests
tests/unit/source/database/test_mysql_iam.py(new):do_connectlistener registered,fresh token minted per connection, SSL enabled, existing SSL preserved,
host/port split correctly.
expiry, expiry parsed from the presigned URL, malformed-token fallback.
tests/unit/connections/test_iam_token_refresh.py(new): characterizes the sharedbuilders.pyIAM path as the remaining gap for the non-MySQL RDS connectors.