b1tg/dmesg VM_L2_PROTECTION_FAULT_STATUS:0x003012B1

## dmesg VM_L2_PROTECTION_FAULT_STATUS:0x003012B1
[2771223.012041] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32779)
[2771223.023037] amdgpu 0000:05:00.0: amdgpu:  for process python3 pid 3895854 thread python3 pid 3895854)
[2771223.033665] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000ffffffbfe000 from IH client 0x1b (UTCL2)
[2771223.045852] amdgpu 0000:05:00.0: amdgpu:   cookie node_id 2 fault from die AID0.XCD1
[2771223.054826] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x003012B1
[2771223.063501] amdgpu 0000:05:00.0: amdgpu:    Faulty UTCL2 client ID: SQC (inst) (0x9)
[2771223.072377] amdgpu 0000:05:00.0: amdgpu:    MORE_FAULTS: 0x1
[2771223.078908] amdgpu 0000:05:00.0: amdgpu:    WALKER_ERROR: 0x0
[2771223.085538] amdgpu 0000:05:00.0: amdgpu:    PERMISSION_FAULTS: 0xb
[2771223.092651] amdgpu 0000:05:00.0: amdgpu:    MAPPING_ERROR: 0x0
[2771223.099372] amdgpu 0000:05:00.0: amdgpu:    RW: 0x0
[2771227.011513] amdgpu 0000:05:00.0: amdgpu: XCC 0: Queue preemption failed for queue with doorbell_id: 80006000
[2771227.022950] amdgpu 0000:05:00.0: amdgpu: XCC 1: Queue preemption failed for queue with doorbell_id: 80006000
[2771227.034277] amdgpu 0000:05:00.0: amdgpu: XCC 2: Queue preemption failed for queue with doorbell_id: 80006000
[2771227.045610] amdgpu 0000:05:00.0: amdgpu: XCC 3: Queue preemption failed for queue with doorbell_id: 80006000
[2771227.056938] amdgpu 0000:05:00.0: amdgpu: XCC 4: Queue preemption failed for queue with doorbell_id: 80006000
[2771227.068263] amdgpu 0000:05:00.0: amdgpu: XCC 5: Queue preemption failed for queue with doorbell_id: 80006000
[2771227.079591] amdgpu 0000:05:00.0: amdgpu: XCC 6: Queue preemption failed for queue with doorbell_id: 80006000
[2771227.090913] amdgpu 0000:05:00.0: amdgpu: XCC 7: Queue preemption failed for queue with doorbell_id: 80006000
[2771227.105113] amdgpu 0000:05:00.0: amdgpu: queue id 0x2 at pasid 3895854 is reset
[2771227.114463] amdgpu 0000:05:00.0: amdgpu: queue id 0x2 at pasid 3895854 is reset
[2771227.124110] amdgpu 0000:05:00.0: amdgpu: queue id 0x2 at pasid 3895854 is reset
[2771227.133481] amdgpu 0000:05:00.0: amdgpu: queue id 0x2 at pasid 3895854 is reset
[2771227.143135] amdgpu 0000:05:00.0: amdgpu: queue id 0x2 at pasid 3895854 is reset
[2771227.152462] amdgpu 0000:05:00.0: amdgpu: queue id 0x2 at pasid 3895854 is reset
[2771227.162089] amdgpu 0000:05:00.0: amdgpu: queue id 0x2 at pasid 3895854 is reset
[2771227.171482] amdgpu 0000:05:00.0: amdgpu: queue id 0x2 at pasid 3895854 is reset
[2771227.180039] amdgpu 0000:05:00.0: amdgpu: Queues reset on process python3 tid 3895854 thread python3 pid 3895854
[2771227.191741] amdgpu 0000:05:00.0: amdgpu: Queues reset on process python3 tid 3888885 thread python3 pid 3888885

## gistfile1.txt
(.venv) (master) ~/tinygrad$ HCQDEV_WAIT_TIMEOUT_MS=300000 DEFAULT_FLOAT=HALF AMD_LLVM=0  BASEDIR="/raid/datasets/wiki" RUNMLPERF=1  PYTHONPATH=.   BENCHMARK=10 GPUS=2 BS=128 MODEL=bert python3
 examples/mlperf/model_train.py
training bert
training on ['AMD:0', 'AMD:1']
Total parameters: 367.480636M
HParam: "GPUS": ['AMD:0', 'AMD:1']
HParam: "seed": 12345
HParam: "BS": 128
HParam: "GRADIENT_ACC_STEPS": 1
HParam: "GLOBAL_BATCH_SIZE": 128
HParam: "EVAL_BS": 2
HParam: "OPT_BASE_LEARNING_RATE": 0.000202072594216369
HParam: "OPT_LAMB_BETA_1": 0.9
HParam: "OPT_LAMB_BETA_2": 0.999
HParam: "TRAIN_STEPS": 28125
HParam: "NUM_WARMUP_STEPS": 1
HParam: "MAX_EVAL_STEPS": 5000
HParam: "EVAL_STEP_FREQ": 1171
HParam: "SAVE_CKPT_FREQ": 1000
HParam: "KEEP_CKPT_AMOUNT": 5
HParam: "SAVE_CKPT_DIR": ./ckpts
HParam: "INIT_CKPT_DIR": /raid/datasets/wiki
HParam: "LOSS_SCALER": 2048.0
HParam: "DECAY": 0.01
HParam: "EPSILON": 1e-06
HParam: "POLY_POWER": 1.0
HParam: "DEFAULT_FLOAT": half
HParam: "DISABLE_DROPOUT": 0
HParam: "TRAIN_BEAM": 0
HParam: "EVAL_BEAM": 0
training with global batch size 128 for one epoch with 28125 steps
    0 42625.96 ms run, 42609.51 ms python,   1.00 ms fetch data,   15.45 ms AMD * 2,  6.78 loss, 0.000202 LR, 10.76 GB used,   5747.66 GFLOPS
    1 36893.95 ms run, 36871.53 ms python,   1.23 ms fetch data,   21.18 ms AMD * 2,  6.33 loss, 0.000202 LR, 172.72 GB used,   6640.64 GFLOPS
Traceback (most recent call last):
  File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 390, in synchronize
    try: self.timeline_signal.wait(self.timeline_value - 1)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 262, in wait
    if not_passed and self.value < value: raise RuntimeError(f"Wait timeout: {timeout} ms! (the signal is not set to {value}, but {self.value})")
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Wait timeout: 300000 ms! (the signal is not set to 16518, but 16515)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/b1tg/tinygrad/examples/mlperf/model_train.py", line 1648, in <module>
    with Profiling(enabled=getenv("PYPROFILE")): globals()[nm]()
                                                 ^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/examples/mlperf/model_train.py", line 1160, in train_bert
    loss = loss.item()
           ^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/tensor.py", line 4227, in _wrapper
    ret = fn(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/tensor.py", line 339, in item
    return self.data()[(0,) * len(self.shape)]
           ^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/tensor.py", line 4202, in _wrapper
    if _METADATA.get() is not None: return fn(*args, **kwargs)
                                           ^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/tensor.py", line 327, in data
    return self._buffer().as_typed_buffer(self.shape)
           ^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/tensor.py", line 4202, in _wrapper
    if _METADATA.get() is not None: return fn(*args, **kwargs)
                                           ^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/tensor.py", line 313, in _buffer
    return cast(Buffer, x.realize().uop.base.buffer).ensure_allocated()
                        ^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/tensor.py", line 4202, in _wrapper
    if _METADATA.get() is not None: return fn(*args, **kwargs)
                                           ^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/tensor.py", line 276, in realize
    run_schedule(*Tensor.schedule_with_vars(*to_realize), do_update_stats=do_update_stats)
  File "/home/b1tg/tinygrad/tinygrad/engine/realize.py", line 257, in run_schedule
    ei.run(var_vals, do_update_stats=do_update_stats)
  File "/home/b1tg/tinygrad/tinygrad/engine/realize.py", line 192, in run
    et = self.prg(bufs, var_vals, wait=wait or DEBUG >= 2)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/engine/realize.py", line 153, in __call__
    self.copy(dest, src)
  File "/home/b1tg/tinygrad/tinygrad/engine/realize.py", line 148, in copy
    dest.copyin(src.as_buffer(allow_zero_copy=True))  # may allocate a CPU buffer depending on allow_zero_copy
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/device.py", line 179, in as_buffer
    return self.copyout(memoryview(bytearray(self.nbytes)))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/device.py", line 198, in copyout
    self.allocator._copyout(mv, self._buf)
  File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 531, in _copyout
    self.dev.synchronize()
  File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 393, in synchronize
    if hasattr(self, 'on_device_hang'): self.on_device_hang()
                                        ^^^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/runtime/ops_amd.py", line 1000, in on_device_hang
    def on_device_hang(self): self.iface.on_device_hang()
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/runtime/ops_amd.py", line 779, in on_device_hang
    raise RuntimeError("\n".join(report))
RuntimeError: MMU fault: 0xFFFFFFBFE000 | NotPresent=0 ReadOnly=0 NoExecute=0 imprecise=0
HW fault: reset_type=0 reset_cause=0 memory_lost=1 gpu_id=54552
^CException ignored in atexit callback: <bound method finalize._exitfunc of <class 'weakref.finalize'>>
Traceback (most recent call last):
  File "/usr/lib/python3.12/weakref.py", line 666, in _exitfunc
    f()
  File "/usr/lib/python3.12/weakref.py", line 590, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 304, in _fini
    def _fini(dev, buf, spec): dev.allocator.free(buf, buf.size, spec)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/device.py", line 262, in free
    else: super().free(opaque, size, options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/device.py", line 231, in free
    self._free(opaque, options if options is not None else self.default_buffer_spec)
  File "/home/b1tg/tinygrad/tinygrad/helpers.py", line 112, in wrapper
    try: return func(*args, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/runtime/ops_amd.py", line 622, in _free
    self.dev.synchronize()
  File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 390, in synchronize
    try: self.timeline_signal.wait(self.timeline_value - 1)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 260, in wait
    self._sleep(time_spent)
  File "/home/b1tg/tinygrad/tinygrad/runtime/ops_amd.py", line 44, in _sleep
    if time_spent_waiting_ms > 2000 and self.is_timeline and self.owner is not None: self.owner.iface.sleep(200)
                                                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/runtime/ops_amd.py", line 766, in sleep
    def sleep(self, tm:int): kfd.AMDKFD_IOC_WAIT_EVENTS(KFDIface.kfd, events_ptr=self.queue_event_arr_ptr, num_events=1, wait_for_all=1, timeout=tm)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/runtime/autogen/kfd.py", line 17, in _do_ioctl
    ret = __fd.ioctl((__idir<<30) | (ctypes.sizeof(made := __user_struct(**kwargs))<<16) | (__base<<8) | __nr, made)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 30, in ioctl
    def ioctl(self, request, arg): return fcntl.ioctl(self.fd, request, arg)
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt:
^CException ignored in atexit callback: <function <lambda> at 0x702c27b98360>
Traceback (most recent call last):
  File "/home/b1tg/tinygrad/tinygrad/device.py", line 51, in <lambda>
    atexit.register(lambda: [Device[dn].finalize() for dn in Device._opened_devices])
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 448, in finalize
    try: self.synchronize() # Try to finalize device in any case.
         ^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 390, in synchronize
    try: self.timeline_signal.wait(self.timeline_value - 1)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 259, in wait
    while (not_passed:=(prev_value:=self.value) < value) and (time_spent:=int(time.perf_counter() * 1000) - start_time) < timeout:
                                                                              ^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt:
^C
	[2771223.012041] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32779)
	[2771223.023037] amdgpu 0000:05:00.0: amdgpu: for process python3 pid 3895854 thread python3 pid 3895854)
	[2771223.033665] amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x0000ffffffbfe000 from IH client 0x1b (UTCL2)
	[2771223.045852] amdgpu 0000:05:00.0: amdgpu: cookie node_id 2 fault from die AID0.XCD1
	[2771223.054826] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x003012B1
	[2771223.063501] amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: SQC (inst) (0x9)
	[2771223.072377] amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
	[2771223.078908] amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
	[2771223.085538] amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0xb
	[2771223.092651] amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
	[2771223.099372] amdgpu 0000:05:00.0: amdgpu: RW: 0x0
	[2771227.011513] amdgpu 0000:05:00.0: amdgpu: XCC 0: Queue preemption failed for queue with doorbell_id: 80006000
	[2771227.022950] amdgpu 0000:05:00.0: amdgpu: XCC 1: Queue preemption failed for queue with doorbell_id: 80006000
	[2771227.034277] amdgpu 0000:05:00.0: amdgpu: XCC 2: Queue preemption failed for queue with doorbell_id: 80006000
	[2771227.045610] amdgpu 0000:05:00.0: amdgpu: XCC 3: Queue preemption failed for queue with doorbell_id: 80006000
	[2771227.056938] amdgpu 0000:05:00.0: amdgpu: XCC 4: Queue preemption failed for queue with doorbell_id: 80006000
	[2771227.068263] amdgpu 0000:05:00.0: amdgpu: XCC 5: Queue preemption failed for queue with doorbell_id: 80006000
	[2771227.079591] amdgpu 0000:05:00.0: amdgpu: XCC 6: Queue preemption failed for queue with doorbell_id: 80006000
	[2771227.090913] amdgpu 0000:05:00.0: amdgpu: XCC 7: Queue preemption failed for queue with doorbell_id: 80006000
	[2771227.105113] amdgpu 0000:05:00.0: amdgpu: queue id 0x2 at pasid 3895854 is reset
	[2771227.114463] amdgpu 0000:05:00.0: amdgpu: queue id 0x2 at pasid 3895854 is reset
	[2771227.124110] amdgpu 0000:05:00.0: amdgpu: queue id 0x2 at pasid 3895854 is reset
	[2771227.133481] amdgpu 0000:05:00.0: amdgpu: queue id 0x2 at pasid 3895854 is reset
	[2771227.143135] amdgpu 0000:05:00.0: amdgpu: queue id 0x2 at pasid 3895854 is reset
	[2771227.152462] amdgpu 0000:05:00.0: amdgpu: queue id 0x2 at pasid 3895854 is reset
	[2771227.162089] amdgpu 0000:05:00.0: amdgpu: queue id 0x2 at pasid 3895854 is reset
	[2771227.171482] amdgpu 0000:05:00.0: amdgpu: queue id 0x2 at pasid 3895854 is reset
	[2771227.180039] amdgpu 0000:05:00.0: amdgpu: Queues reset on process python3 tid 3895854 thread python3 pid 3895854
	[2771227.191741] amdgpu 0000:05:00.0: amdgpu: Queues reset on process python3 tid 3888885 thread python3 pid 3888885
	(.venv) (master) ~/tinygrad$ HCQDEV_WAIT_TIMEOUT_MS=300000 DEFAULT_FLOAT=HALF AMD_LLVM=0 BASEDIR="/raid/datasets/wiki" RUNMLPERF=1 PYTHONPATH=. BENCHMARK=10 GPUS=2 BS=128 MODEL=bert python3
	examples/mlperf/model_train.py
	training bert
	training on ['AMD:0', 'AMD:1']
	Total parameters: 367.480636M
	HParam: "GPUS": ['AMD:0', 'AMD:1']
	HParam: "seed": 12345
	HParam: "BS": 128
	HParam: "GRADIENT_ACC_STEPS": 1
	HParam: "GLOBAL_BATCH_SIZE": 128
	HParam: "EVAL_BS": 2
	HParam: "OPT_BASE_LEARNING_RATE": 0.000202072594216369
	HParam: "OPT_LAMB_BETA_1": 0.9
	HParam: "OPT_LAMB_BETA_2": 0.999
	HParam: "TRAIN_STEPS": 28125
	HParam: "NUM_WARMUP_STEPS": 1
	HParam: "MAX_EVAL_STEPS": 5000
	HParam: "EVAL_STEP_FREQ": 1171
	HParam: "SAVE_CKPT_FREQ": 1000
	HParam: "KEEP_CKPT_AMOUNT": 5
	HParam: "SAVE_CKPT_DIR": ./ckpts
	HParam: "INIT_CKPT_DIR": /raid/datasets/wiki
	HParam: "LOSS_SCALER": 2048.0
	HParam: "DECAY": 0.01
	HParam: "EPSILON": 1e-06
	HParam: "POLY_POWER": 1.0
	HParam: "DEFAULT_FLOAT": half
	HParam: "DISABLE_DROPOUT": 0
	HParam: "TRAIN_BEAM": 0
	HParam: "EVAL_BEAM": 0
	training with global batch size 128 for one epoch with 28125 steps
	0 42625.96 ms run, 42609.51 ms python, 1.00 ms fetch data, 15.45 ms AMD * 2, 6.78 loss, 0.000202 LR, 10.76 GB used, 5747.66 GFLOPS
	1 36893.95 ms run, 36871.53 ms python, 1.23 ms fetch data, 21.18 ms AMD * 2, 6.33 loss, 0.000202 LR, 172.72 GB used, 6640.64 GFLOPS
	Traceback (most recent call last):
	File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 390, in synchronize
	try: self.timeline_signal.wait(self.timeline_value - 1)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 262, in wait
	if not_passed and self.value < value: raise RuntimeError(f"Wait timeout: {timeout} ms! (the signal is not set to {value}, but {self.value})")
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	RuntimeError: Wait timeout: 300000 ms! (the signal is not set to 16518, but 16515)

	During handling of the above exception, another exception occurred:

	Traceback (most recent call last):
	File "/home/b1tg/tinygrad/examples/mlperf/model_train.py", line 1648, in <module>
	with Profiling(enabled=getenv("PYPROFILE")): globals()[nm]()
	^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/examples/mlperf/model_train.py", line 1160, in train_bert
	loss = loss.item()
	^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/tensor.py", line 4227, in _wrapper
	ret = fn(args, *kwargs)
	^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/tensor.py", line 339, in item
	return self.data()[(0,) * len(self.shape)]
	^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/tensor.py", line 4202, in _wrapper
	if _METADATA.get() is not None: return fn(args, *kwargs)
	^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/tensor.py", line 327, in data
	return self._buffer().as_typed_buffer(self.shape)
	^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/tensor.py", line 4202, in _wrapper
	if _METADATA.get() is not None: return fn(args, *kwargs)
	^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/tensor.py", line 313, in _buffer
	return cast(Buffer, x.realize().uop.base.buffer).ensure_allocated()
	^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/tensor.py", line 4202, in _wrapper
	if _METADATA.get() is not None: return fn(args, *kwargs)
	^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/tensor.py", line 276, in realize
	run_schedule(Tensor.schedule_with_vars(to_realize), do_update_stats=do_update_stats)
	File "/home/b1tg/tinygrad/tinygrad/engine/realize.py", line 257, in run_schedule
	ei.run(var_vals, do_update_stats=do_update_stats)
	File "/home/b1tg/tinygrad/tinygrad/engine/realize.py", line 192, in run
	et = self.prg(bufs, var_vals, wait=wait or DEBUG >= 2)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/engine/realize.py", line 153, in __call__
	self.copy(dest, src)
	File "/home/b1tg/tinygrad/tinygrad/engine/realize.py", line 148, in copy
	dest.copyin(src.as_buffer(allow_zero_copy=True)) # may allocate a CPU buffer depending on allow_zero_copy
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/device.py", line 179, in as_buffer
	return self.copyout(memoryview(bytearray(self.nbytes)))
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/device.py", line 198, in copyout
	self.allocator._copyout(mv, self._buf)
	File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 531, in _copyout
	self.dev.synchronize()
	File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 393, in synchronize
	if hasattr(self, 'on_device_hang'): self.on_device_hang()
	^^^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/runtime/ops_amd.py", line 1000, in on_device_hang
	def on_device_hang(self): self.iface.on_device_hang()
	^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/runtime/ops_amd.py", line 779, in on_device_hang
	raise RuntimeError("\n".join(report))
	RuntimeError: MMU fault: 0xFFFFFFBFE000 \| NotPresent=0 ReadOnly=0 NoExecute=0 imprecise=0
	HW fault: reset_type=0 reset_cause=0 memory_lost=1 gpu_id=54552
	^CException ignored in atexit callback: <bound method finalize._exitfunc of <class 'weakref.finalize'>>
	Traceback (most recent call last):
	File "/usr/lib/python3.12/weakref.py", line 666, in _exitfunc
	f()
	File "/usr/lib/python3.12/weakref.py", line 590, in __call__
	return info.func(info.args, *(info.kwargs or {}))
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 304, in _fini
	def _fini(dev, buf, spec): dev.allocator.free(buf, buf.size, spec)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/device.py", line 262, in free
	else: super().free(opaque, size, options)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/device.py", line 231, in free
	self._free(opaque, options if options is not None else self.default_buffer_spec)
	File "/home/b1tg/tinygrad/tinygrad/helpers.py", line 112, in wrapper
	try: return func(args, *kwargs)
	^^^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/runtime/ops_amd.py", line 622, in _free
	self.dev.synchronize()
	File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 390, in synchronize
	try: self.timeline_signal.wait(self.timeline_value - 1)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 260, in wait
	self._sleep(time_spent)
	File "/home/b1tg/tinygrad/tinygrad/runtime/ops_amd.py", line 44, in _sleep
	if time_spent_waiting_ms > 2000 and self.is_timeline and self.owner is not None: self.owner.iface.sleep(200)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/runtime/ops_amd.py", line 766, in sleep
	def sleep(self, tm:int): kfd.AMDKFD_IOC_WAIT_EVENTS(KFDIface.kfd, events_ptr=self.queue_event_arr_ptr, num_events=1, wait_for_all=1, timeout=tm)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/runtime/autogen/kfd.py", line 17, in _do_ioctl
	ret = __fd.ioctl((__idir<<30) \| (ctypes.sizeof(made := __user_struct(**kwargs))<<16) \| (__base<<8) \| __nr, made)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 30, in ioctl
	def ioctl(self, request, arg): return fcntl.ioctl(self.fd, request, arg)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	KeyboardInterrupt:
	^CException ignored in atexit callback: <function <lambda> at 0x702c27b98360>
	Traceback (most recent call last):
	File "/home/b1tg/tinygrad/tinygrad/device.py", line 51, in <lambda>
	atexit.register(lambda: [Device[dn].finalize() for dn in Device._opened_devices])
	^^^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 448, in finalize
	try: self.synchronize() # Try to finalize device in any case.
	^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 390, in synchronize
	try: self.timeline_signal.wait(self.timeline_value - 1)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/home/b1tg/tinygrad/tinygrad/runtime/support/hcq.py", line 259, in wait
	while (not_passed:=(prev_value:=self.value) < value) and (time_spent:=int(time.perf_counter() * 1000) - start_time) < timeout:
	^^^^^^^^^^^^^^^^^^^
	KeyboardInterrupt:
	^C