Skip to content

Enable 128bit atomics#803

Draft
sleeepyjack wants to merge 5 commits intoNVIDIA:devfrom
sleeepyjack:worktree-128bit-atomics
Draft

Enable 128bit atomics#803
sleeepyjack wants to merge 5 commits intoNVIDIA:devfrom
sleeepyjack:worktree-128bit-atomics

Conversation

@sleeepyjack
Copy link
Copy Markdown
Collaborator

No description provided.

@sleeepyjack sleeepyjack self-assigned this Apr 17, 2026
@sleeepyjack sleeepyjack added P1: Should have Necessary but not critical In Progress Currently a work in progress type: improvement Improvement / enhancement to an existing function labels Apr 17, 2026
Comment on lines +72 to +76
static_assert(sizeof(Key) <= 8, "Container does not support key types larger than 8 bytes.");
static_assert(sizeof(Key) <= cuco::detail::max_key_size,
"Key size exceeds the maximum supported size (8 bytes, or 16 with sm_90+).");

static_assert(sizeof(Value) <= 16, "Container does not support slot types larger than 16 bytes.");
static_assert(sizeof(Value) <= cuco::detail::max_slot_size,
"Slot size exceeds the maximum supported size (16 bytes, or 32 with sm_90+).");
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move this check to the _ref_impl class so we unify the checks in one spot

Comment on lines +1851 to +1853
static_assert(
has_payload,
"16-byte key types in key-only containers require sm_90+ for 128-bit atomic CAS.");
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this check redundant to the one at the beginning of the class?

Comment on lines +1893 to +1895
static_assert(
has_payload,
"16-byte key types in key-only containers require sm_90+ for 128-bit atomic CAS.");
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Comment on lines 24 to 35
template <typename First, typename Second>
__host__ __device__ constexpr pair<First, Second>::pair(First const& f, Second const& s)
: first{f}, second{s}
: first(f), second(s)
{
}

template <typename First, typename Second>
template <typename F, typename S>
__host__ __device__ constexpr pair<First, Second>::pair(pair<F, S> const& p)
: first{p.first}, second{p.second}
: first(p.first), second(p.second)
{
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why this change is required

Comment on lines 55 to 90
probing_scheme,
alloc,
stream)},
empty_value_sentinel_{empty_value_sentinel}
empty_value_sentinel_(empty_value_sentinel)
{
}

template <class Key,
class T,
class Extent,
cuda::thread_scope Scope,
class KeyEqual,
class ProbingScheme,
class Allocator,
class Storage>
constexpr static_map<Key, T, Extent, Scope, KeyEqual, ProbingScheme, Allocator, Storage>::
static_map(Extent n,
double desired_load_factor,
empty_key<Key> empty_key_sentinel,
empty_value<T> empty_value_sentinel,
KeyEqual const& pred,
ProbingScheme const& probing_scheme,
cuda_thread_scope<Scope>,
Storage,
Allocator const& alloc,
cuda::stream_ref stream)
: impl_{std::make_unique<impl_type>(n,
desired_load_factor,
cuco::pair{empty_key_sentinel, empty_value_sentinel},
pred,
probing_scheme,
alloc,
stream)},
empty_value_sentinel_{empty_value_sentinel}
empty_value_sentinel_(empty_value_sentinel)
{
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. I don't think they are required

Comment on lines -163 to +171
(int64_t, int64_t, cuco::test::probe_sequence::linear_probing, 2))
(int64_t, int64_t, cuco::test::probe_sequence::linear_probing, 2)
#if defined(CUCO_HAS_128BIT_ATOMICS)
,
(__int128_t, __int128_t, cuco::test::probe_sequence::double_hashing, 1),
(__int128_t, __int128_t, cuco::test::probe_sequence::double_hashing, 2),
(__int128_t, __int128_t, cuco::test::probe_sequence::linear_probing, 1),
(__int128_t, __int128_t, cuco::test::probe_sequence::linear_probing, 2)
#endif
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Comment on lines -95 to +104
(int64_t, int64_t, cuco::test::probe_sequence::linear_probing, 2))
(int64_t, int64_t, cuco::test::probe_sequence::linear_probing, 2)
#if defined(CUCO_HAS_128BIT_ATOMICS)
,
(__int128_t, __int128_t, cuco::test::probe_sequence::double_hashing, 1),
(__int128_t, __int128_t, cuco::test::probe_sequence::double_hashing, 2),
(__int128_t, __int128_t, cuco::test::probe_sequence::linear_probing, 1),
(__int128_t, __int128_t, cuco::test::probe_sequence::linear_probing, 2)
#endif
)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Comment on lines +123 to +130
(int64_t, int64_t, cuco::test::probe_sequence::linear_probing, 2)
#if defined(CUCO_HAS_128BIT_ATOMICS)
,
(__int128_t, __int128_t, cuco::test::probe_sequence::double_hashing, 1),
(__int128_t, __int128_t, cuco::test::probe_sequence::double_hashing, 2),
(__int128_t, __int128_t, cuco::test::probe_sequence::linear_probing, 1),
(__int128_t, __int128_t, cuco::test::probe_sequence::linear_probing, 2)
#endif
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Comment on lines -98 to +107
(int64_t, int64_t, cuco::test::probe_sequence::linear_probing, 2))
(int64_t, int64_t, cuco::test::probe_sequence::linear_probing, 2)
#if defined(CUCO_HAS_128BIT_ATOMICS)
,
(__int128_t, __int128_t, cuco::test::probe_sequence::double_hashing, 1),
(__int128_t, __int128_t, cuco::test::probe_sequence::double_hashing, 2),
(__int128_t, __int128_t, cuco::test::probe_sequence::linear_probing, 1),
(__int128_t, __int128_t, cuco::test::probe_sequence::linear_probing, 2)
#endif
)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Comment on lines -89 to +98
(int64_t, int64_t, cuco::test::probe_sequence::linear_probing, 2))
(int64_t, int64_t, cuco::test::probe_sequence::linear_probing, 2)
#if defined(CUCO_HAS_128BIT_ATOMICS)
,
(__int128_t, __int128_t, cuco::test::probe_sequence::double_hashing, 1),
(__int128_t, __int128_t, cuco::test::probe_sequence::double_hashing, 2),
(__int128_t, __int128_t, cuco::test::probe_sequence::linear_probing, 1),
(__int128_t, __int128_t, cuco::test::probe_sequence::linear_probing, 2)
#endif
)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Comment on lines -28 to +29
using KEY_TYPE_RANGE = nvbench::type_list<nvbench::int32_t, nvbench::int64_t>;
using VALUE_TYPE_RANGE = nvbench::type_list<nvbench::int32_t, nvbench::int64_t>;
using KEY_TYPE_RANGE = nvbench::type_list<nvbench::int32_t, nvbench::int64_t, __int128_t>;
using VALUE_TYPE_RANGE = nvbench::type_list<nvbench::int32_t, nvbench::int64_t, __int128_t>;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 128bit is not supported on the benchmark GPU, this will fail, no?

@sleeepyjack sleeepyjack force-pushed the worktree-128bit-atomics branch from 1cbf8bb to 4a6ccc2 Compare April 17, 2026 23:53
@sleeepyjack sleeepyjack added the topic: performance Performance related issue label Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

In Progress Currently a work in progress P1: Should have Necessary but not critical topic: performance Performance related issue type: improvement Improvement / enhancement to an existing function

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant