GH-49720: [C++] Optimize base64_decode validation using lookup table by Reranko05 · Pull Request #49748 · apache/arrow

Reranko05 · 2026-04-14T20:37:29Z

Rationale for this change

The current implementation of base64_decode validates characters using std::string::find for each byte, which introduces unnecessary overhead due to repeated linear searches.

This change replaces those lookups with a precomputed 256-entry lookup table, enabling constant-time validation and value lookup per character.

What changes are included in this PR?

Introduced a static lookup table (kBase64Lookup) to map base64 characters to their corresponding values
Replaced std::string::find with constant-time table lookup for character validation

Are these changes tested?

Yes. Existing base64 decoding behavior remains unchanged and continues to pass all current tests. This change is a performance optimization and does not alter functional output.

Are there any user-facing changes?

No.

This change is internal and does not affect public APIs.

GitHub Issue: [C++] Optimize base64_decode validation using lookup table #49720

github-actions · 2026-04-14T20:38:53Z

⚠️ GitHub issue #49720 has been automatically assigned in GitHub to PR creator.

Reranko05 · 2026-04-15T12:05:24Z

Hi @kou @dmitry-chirkov-dremio, this implements the lookup-table optimization we discussed in PR #49660 .
Would appreciate your feedback when convenient. Thanks!

Copilot

Pull request overview

Optimizes C++ base64_decode character validation by replacing per-character std::string::find lookups with a precomputed 256-entry lookup table.

Changes:

Added a static 256-entry lookup table (kBase64Lookup) mapping bytes to Base64 sextet values (or invalid marker).
Updated base64_decode to validate and decode using constant-time table lookup instead of std::string::find.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Reranko05 · 2026-04-16T00:43:19Z

Hi @kou, I've addressed the copilot review feedback:

Added include for fixed-width types
Fixed narrowing conversion with explicit cast
Cached lookup result to avoid redundant table access

Please take another look when convenient. Thanks!

kou

+1

Have you checked whether this optimization improves performance or not?

Reranko05 · 2026-04-17T07:01:47Z

Hi @kou, I ran a quick benchmark comparing the previous implementation and this change.

Results (200 iterations):

Small (~100B):

Before: ~0 ms
After: ~0 ms

Medium (~10KB):

Before: 58 ms
After: 12 ms

Large (~1MB):

Before: 4302 ms
After: 1126 ms

The improvement is negligible for very small inputs but becomes significant as input size grows. This aligns with expectations, since the change replaces repeated linear std::string::find lookups with constant-time table access per character.

kou · 2026-04-17T07:59:59Z

Great!

@dmitry-chirkov-dremio do you want to review this before we merge this?

dmitry-chirkov-dremio · 2026-04-17T12:02:50Z

@Reranko05 do you have before before numbers?

Reranko05 · 2026-04-17T12:31:34Z

Hi @dmitry-chirkov-dremio, just to clarify, by "before before" do you mean comparing against the original implementation prior to #49660, or are you asking for something else?

Reranko05 · 2026-04-17T13:31:20Z

Hi @dmitry-chirkov-dremio, if by “before before” you meant the implementation prior to #49660, here are benchmarks across all three stages:

Before-before (original, pre-#49660):
Small (~100B): ~0 ms
Medium (~10KB): 27 ms
Large (~1MB): 2832 ms

Before (after #49660):
Small (~100B): ~0 ms
Medium (~10KB): 58 ms
Large (~1MB): 4302 ms

After (this PR):
Small (~100B): ~0 ms
Medium (~10KB): 12 ms
Large (~1MB): 1126 ms

The validation changes introduced some overhead (expected due to additional checks), and this optimization not only removes that regression but also improves performance beyond the original implementation by replacing repeated linear lookups with constant-time table access.

github-actions bot added the awaiting review Awaiting review label Apr 14, 2026

github-actions bot added the Component: C++ label Apr 14, 2026

Reranko05 force-pushed the optimize-base64-lookup branch from 9c55ff0 to e48fd5c Compare April 14, 2026 20:41

kou requested a review from Copilot April 15, 2026 21:10

Copilot started reviewing on behalf of kou April 15, 2026 21:10 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Comment thread cpp/src/arrow/vendored/base64.cpp

Comment thread cpp/src/arrow/vendored/base64.cpp Outdated

Comment thread cpp/src/arrow/vendored/base64.cpp Outdated

Optimize base64_decode validation using lookup table

9735d72

Reranko05 force-pushed the optimize-base64-lookup branch from e48fd5c to 9735d72 Compare April 16, 2026 00:41

kou approved these changes Apr 17, 2026

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-49720: [C++] Optimize base64_decode validation using lookup table#49748

GH-49720: [C++] Optimize base64_decode validation using lookup table#49748
Reranko05 wants to merge 1 commit intoapache:mainfrom
Reranko05:optimize-base64-lookup

Reranko05 commented Apr 14, 2026 •

edited by kou

Loading

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

Reranko05 commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reranko05 commented Apr 16, 2026

Uh oh!

kou left a comment

Uh oh!

Reranko05 commented Apr 17, 2026

Uh oh!

kou commented Apr 17, 2026

Uh oh!

dmitry-chirkov-dremio commented Apr 17, 2026

Uh oh!

Reranko05 commented Apr 17, 2026

Uh oh!

Reranko05 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Reranko05 commented Apr 14, 2026 • edited by kou Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

Reranko05 commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reranko05 commented Apr 16, 2026

Uh oh!

kou left a comment

Choose a reason for hiding this comment

Uh oh!

Reranko05 commented Apr 17, 2026

Uh oh!

kou commented Apr 17, 2026

Uh oh!

dmitry-chirkov-dremio commented Apr 17, 2026

Uh oh!

Reranko05 commented Apr 17, 2026

Uh oh!

Reranko05 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Reranko05 commented Apr 14, 2026 •

edited by kou

Loading