Skip to content

cp.async ergonomics: missing address-space-aware intrinsic forces inline PTX #378

@Y-jiji

Description

@Y-jiji

Summary

Using cp.async from Rust-CUDA currently requires handwritten inline PTX plus manual shared-address conversion in user kernels.

In practice, the missing piece is an address-space-aware API/intrinsic for cp.async.cg.shared.global.

Problem

For an async copy path like:

  • global source pointer
  • shared-memory destination
  • 16-byte copy (4 x f32)

users currently end up writing something like:

unsafe fn shared_addr(ptr: *mut f32) -> u32 {
    let mut addr: u64;
    asm!(
        "cvta.to.shared.u64 {dst}, {src};",
        dst = out(reg64) addr,
        src = in(reg64) ptr,
    );
    addr as u32
}

unsafe fn cp_async4(dst_shared: u32, src_global: *const f32) {
    asm!(
        "cp.async.cg.shared.global [{dst}], [{src}], 16;",
        dst = in(reg32) dst_shared,
        src = in(reg64) src_global,
    );
}

and then precompute shared base addresses in the kernel and pass integer offsets into the helper.

If we instead pass generic/raw pointers through a helper, PTX tends to contain extra address-space conversion glue around the cp.async sites.

Why this matters

This is exactly the kind of operation where users want:

  • a small, explicit intrinsic
  • correct shared/global address-space handling
  • no repeated manual inline PTX in every project

Right now, getting good code requires low-level PTX knowledge and manual control over shared address conversion.

Requested improvement

A Rust-CUDA intrinsic/helper for cp.async.cg.shared.global (or equivalent family) that:

  • represents the destination as shared-memory address space explicitly
  • avoids forcing users to manually convert *mut T into a u32 shared address
  • maps cleanly to the PTX async-copy instructions used in modern CUDA kernels

Even a low-level unsafe API would be useful if it preserves the right address-space semantics and avoids generic-pointer friction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions