grandfather existing implementation behavior for applications coming from both gpu-centric and cpu-centric.
- If an implementation had been traditionally cpu-centric, existing code should continue to work (and use the CPU heap)
- If an implementation had been traditionally gpu-centric, existing code should continue to work (and use the GPU heap)
- A (new) mechanism should be available for advanced users to control what heap they malloc from
- A (new) mechanism should be available for advanced users to query what type of heap they allocated from
- shmem_malloc(size_t) returns a pointer to a symmetric heap object, whether the pointer is a CPU or GPU buffer is implementation dependent
- additional envvars to control heap allocation behavior:
- SHMEM_SYMMETRIC_SIZE (size of the CPU heap)
- SHMEM_DEVICE_SYMMETRIC_SIZE (size of the Device heap)
- SHMEM_SYMMETRIC_HEAP_DEFAULT (type of the default heap: values HOST, DEVICE, unset)
- shmem_malloc_with_hint(size_t, long hints) is expanded with the additional hints
- SHMEM_MALLOC_DEVICE_HEAP, SHMEM_MALLOC_HOST_HEAP
- TBD: what happens if hint requests DEVICE_HEAP but is not available in the implementation? error? return CPU heap pointer?
- query API (see other proposal for unified query API here: https://github.com/ROCm/rocm-systems/commit/cbe38c7ec6be01bb60091834b494be0d41e51ace#diff-5810830a650305dcb8a598db550257b8147af1428cba56ff84623f8a8dbce203R93
- report in the query result if shmem_malloc (no hint) provides CPU or GPU memory (details TBD, not hard)
int main(int* argc, char* argv[]) {
//...
shmem_init();
cpubuf = shmem_malloc_with_hints(sizeof(long), SHMEM_MALLOC_HOST_HEAP);
gpubuf = shmem_malloc_with_hints(sizeof(long), SHMEM_MALLOC_DEVICE_HEAP);
// backward compatible call pattern
// use default allocation policy, may be a CPU or GPU ptr depending on implementation/envvars
buf = shmem_malloc(sizeof(long));
//do something useful with the buffers either
// 1. cpu initiated with cpubuf, buf (assuming non-GPU aware impl.)
// 2. cpu initiated with all allocation patterns (assuming GPU aware impl.)
// 2. gpu initiated with gpubuf, or buf (when default to GPU is enabled); (assuming GPU initiated capable impl.)
shmem_free(cpubuf);
shmem_free(gpubuf);
shmem_free(buf);
shmem_finalize();
}Core of this proposal is to repurpose the existing shmem_malloc_with_hints to do th ebidding for us. An alternative is to introduce a new shmem_malloc_device to capture the same capability (explicit device heap allocation).