HIVE-29559: materializeCTE to query Analyzer from DDLSemanticAnalyzerFactory#6423
HIVE-29559: materializeCTE to query Analyzer from DDLSemanticAnalyzerFactory#6423konstantinb wants to merge 9 commits intoapache:masterfrom
Conversation
…r instead of SemanticAnalyzer
….out file generated with proper test driver this time
…iginally intended by HIVE-28724
| UNION ALL | ||
| SELECT * FROM cte; | ||
|
|
||
| -- Test the recompile-without-CBO auto-trigger path. |
There was a problem hiding this comment.
I've left very verbose comments in here to better explain the scenario. I will gladly remove or shorten this comment section once reviewed
There was a problem hiding this comment.
Based on the scope of this patch, this path does not need to be covered because the previous test case already covers the non-CBO path.
TBH, I’m not sure it is necessary to add this .q file, since we already have many tests for CTE materialization (cte_mat_*.q). Those tests cover the CBO path, which should always be the default. In general, the non-CBO path is no longer supported.
https://issues.apache.org/jira/browse/HIVE-27830
https://issues.apache.org/jira/browse/HIVE-28741
There was a problem hiding this comment.
TBH, I’m not sure it is necessary to add this .q file
@kasakrisz, while I can agree that testing the non-CBO configured path becomes redundant once full CBO deprecation gets addressed, I believe that having at least one new query that guards against similar regressions could be beneficial. The last query of the test file generates the following when executed against master: cte_materialize.q.test.log.txt
Based on the scope of this patch, this path does not need to be covered because the previous test case already covers the non-CBO path.
The very first test in the file is not broken in the current master. I thought it would be useful to leave it in to confirm that my changes introduce no additional regressions for the CBO path, but you helped me to see that it is indeed redundant now
|
Hi @kasakrisz @abstractdog — follow-up to HIVE-28724. Tests passing, SonarQube clean. Would value your eyes when you have a chance. Diff is mostly cleanup — net negative LOC. HIVE-28724 introduced The same asymmetry also produced an NPE on materialized CTEs whenever the runtime analyzer instance is a plain Aware of the non-CBO deprecation direction — this PR doesn't expand that surface, it removes a duplicate method. Happy to adjust the shape if you'd prefer something different. cc @deniskuzZ @ramitg254 for visibility. |
deniskuzZ
left a comment
There was a problem hiding this comment.
I’m not an expert in this area, but the change looks reasonable to me.
kasakrisz
left a comment
There was a problem hiding this comment.
I welcome this effort to remove duplicated code and have left comments on the test code.
| // Reference CTE 3 times to exceed default materialization threshold of 2 | ||
| String query = "WITH cte AS (SELECT COUNT(*) as cnt FROM table1) " + | ||
| "SELECT * FROM cte UNION ALL SELECT * FROM cte UNION ALL SELECT * FROM cte"; |
There was a problem hiding this comment.
You can also lower the threshold (hive.optimize.cte.materialize.threshold) to 1 in the scope of this test case and use a simpler query
| @@ -0,0 +1,46 @@ | |||
| -- Test CTE materialization with both CBO enabled and disabled | |||
| -- Verifies DDLSemanticAnalyzerFactory is used for CTE materialization | |||
There was a problem hiding this comment.
How can this be verified from q test?
There was a problem hiding this comment.
I agree that this is a bit too bold a statement - the test does confirm that the NPE is fixed, but does not really validate classes used
| UNION ALL | ||
| SELECT * FROM cte; | ||
|
|
||
| -- Test the recompile-without-CBO auto-trigger path. |
There was a problem hiding this comment.
Based on the scope of this patch, this path does not need to be covered because the previous test case already covers the non-CBO path.
TBH, I’m not sure it is necessary to add this .q file, since we already have many tests for CTE materialization (cte_mat_*.q). Those tests cover the CBO path, which should always be the default. In general, the non-CBO path is no longer supported.
https://issues.apache.org/jira/browse/HIVE-27830
https://issues.apache.org/jira/browse/HIVE-28741
|
❌ The last analysis has failed. |
|
@deniskuzZ, thank you for your feedback @kasakrisz, thank you very much for your prompt review. I have modified the Java test code side as suggested. I have also reduced the number of tests in the .q file to a single test that runs into an NPE on the current branch, with CBO ON by default and ALWAYS as the fallback logic. I believe this test provides value as long as the Non-CBO ReExec driver remains in the repository. The effort to remove all non-CBO execution paths is substantial and will likely take months, if not years. However, if you strongly feel that this .q file is unnecessary, I would be happy to delete it. Could you please let me know? |
What changes were proposed in this pull request?
HIVE-29559: SemanticAnalyzer.materializeCTE to query SemanticAnalyzer instance from DDLSemanticAnalyzerFactory
Why are the changes needed?
As originally intended by the changes for HIVE-28724, the PR #5973 has introduced a new CreateTableAnalyzer class and registered it with DDLSemanticAnalyzerFactory
Unfortunately, the PR has also introduced a new CalcitePlanner.The materializeCTE() method is almost an exact copy of SemanticAnalyzer.materializeCTE(), with only one difference: it explicitly uses CreateTableAnalyzer instead of SemanticAnalyzer.
Because some code was removed from SemanticAnalyzer, its own implementation of materializeCTE() broke during the merge of this PR. The new query file of this PR reproduces the NPE happening with the current master branch code.
The first, "easy" and working choice was to copy the whole materializeCTE() from CalcitePlanner to SemanticAnalyzer and remove materializeCTE() from CalcitePlanner completely. I decided to take it a bit further due to the following reasons:
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit tests, new query file, all existing tests pass