Skip to content

Instantly share code, notes, and snippets.

@abouteiller
Last active February 19, 2026 18:20
Show Gist options
  • Select an option

  • Save abouteiller/67d96357d2e06b9c15776ac0b36446eb to your computer and use it in GitHub Desktop.

Select an option

Save abouteiller/67d96357d2e06b9c15776ac0b36446eb to your computer and use it in GitHub Desktop.
Mockup shmem-multi-heaps

Goal

grandfather existing implementation behavior for applications coming from both gpu-centric and cpu-centric.

  1. If an implementation had been traditionally cpu-centric, existing code should continue to work (and use the CPU heap)
  2. If an implementation had been traditionally gpu-centric, existing code should continue to work (and use the GPU heap)
  3. A (new) mechanism should be available for advanced users to control what heap they malloc from
  4. A (new) mechanism should be available for advanced users to query what type of heap they allocated from

Idea

  1. shmem_malloc(size_t) returns a pointer to a symmetric heap object, whether the pointer is a CPU or GPU buffer is implementation dependent
  2. additional envvars to control heap allocation behavior:
    1. SHMEM_SYMMETRIC_SIZE (size of the CPU heap)
    2. SHMEM_DEVICE_SYMMETRIC_SIZE (size of the Device heap)
    3. SHMEM_SYMMETRIC_HEAP_DEFAULT (type of the default heap: values HOST, DEVICE, unset)
  3. shmem_malloc_with_hint(size_t, long hints) is expanded with the additional hints
    1. SHMEM_MALLOC_DEVICE_HEAP, SHMEM_MALLOC_HOST_HEAP
    2. TBD: what happens if hint requests DEVICE_HEAP but is not available in the implementation? error? return CPU heap pointer?
  4. query API (see other proposal for unified query API here: https://github.com/ROCm/rocm-systems/commit/cbe38c7ec6be01bb60091834b494be0d41e51ace#diff-5810830a650305dcb8a598db550257b8147af1428cba56ff84623f8a8dbce203R93
    1. report in the query result if shmem_malloc (no hint) provides CPU or GPU memory (details TBD, not hard)

Mockup user code

int main(int* argc, char* argv[]) {
//...
  shmem_init();
  
  cpubuf = shmem_malloc_with_hints(sizeof(long), SHMEM_MALLOC_HOST_HEAP);
  gpubuf = shmem_malloc_with_hints(sizeof(long), SHMEM_MALLOC_DEVICE_HEAP);
  
  // backward compatible call pattern
  // use default allocation policy, may be a CPU or GPU ptr depending on implementation/envvars 
  buf = shmem_malloc(sizeof(long)); 
  
  //do something useful with the buffers either
  //  1. cpu initiated with cpubuf, buf (assuming non-GPU aware impl.)
  //  2. cpu initiated with all allocation patterns (assuming GPU aware impl.)
  //  2. gpu initiated with gpubuf, or buf (when default to GPU is enabled); (assuming GPU initiated capable impl.)
  
  shmem_free(cpubuf);
  shmem_free(gpubuf);
  shmem_free(buf);
  
  shmem_finalize();
}

Alternatives

Core of this proposal is to repurpose the existing shmem_malloc_with_hints to do th ebidding for us. An alternative is to introduce a new shmem_malloc_device to capture the same capability (explicit device heap allocation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment