Oscar Barenys oscarbg

## README.md

      
              1 file
            
          
              4 forks
            
          
                0 comments
              
            
              20 stars
            
          
                ItzLevvie
                / README.md
            
            
              Last active
              February 25, 2026 22:18
            
              
                All of the flashable factory images for Google Pixel devices running Android Canary builds.
              
          
    Android Canary (202602) — ZP11.260123.011 (14822050)


Device
Flash
Download
SHA-256 Checksum


Pixel 6 
 (oriole)
Link
Link
e0a63e51f9cca5307d515852039628f4ecb394d61f9a29978e7043804027627e


Pixel 6 Pro 
 (raven)
Link
Link
54aa99e72bf855311e54d0758aaea8ccd456ff8fb72d7817bde7f4f355274a6c


Pixel 6a  (bluejay)
Link
Link
964af292ca1977a089a87d327fb2b0a4d44a3e0e274987622


## TinyGrad-notes.md

      
              1 file
            
          
              0 forks
            
          
                39 comments
              
            
              15 stars
            
          
                fxkamd
                / TinyGrad-notes.md
            
            
              Last active
              December 4, 2025 06:01
            
              
                Observations about HSA and KFD backends in TinyGrad
              
          
    This is Felix Kuehling, long time KFD driver architect. I started looking into the TinyGrad source code yesterday, focusing on ops_kfd.py, ops_hsa.py and driver/hsa.py, to understand how TinyGrad talks to our HW and help with the ongoing debugging effort from the top down. This analysis is based on this commit: https://github.com/tinygrad/tinygrad/tree/3de855ea50d72238deac14fc05cda2a611497778
I'm intrigued by the use of Python for low-level programming. I think I can learn something from your use of ctypes and clang2py for fast prototyping and test development. I want to share some observations based on my initial review.
ops_kfd looks pretty new, and I see many problems with it based on my long experience working on KFD. I think it's interesting, but probably not relevant for the most pressing problems at hand, so I'll cover that last.
ops_hsa uses ROCr APIs to manage GPU memory, create a user mode AQL queue for GPU kernel dispatch, async SDMA copies, and signal-based synchronization with barrier packets

  
## benchmark_7900XTX_10142023.txt
(tf) root@rocm:~/tmp# python benchmark.py
2023-10-14 15:02:22.116047: E external/local_xla/xla/stream_executor/plugin_registry.cc:93] Invalid plugin kind specified: DNN
2023-10-14 15:02:22.348480: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-14 15:02:23.756833: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:23.982269: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-14 15:02:23.9823

## simd_partition.metal
//  Created by Timothy Davison on 2023-06-21.
//
// This is a Metal implementation of subgroupPartitionNV. You use it to find a mask of
// the other threads in a simd-group with the same value (a partition of the simd-group about
// a set of values).
//
// Feel free to use this in your code. Please share any fixes or ideas to make it faster.
//
// Khronos docs on subgroup partitioning:
// - https://github.com/KhronosGroup/GLSL/blob/master/extensions/nv/GL_NV_shader_subgroup_partitioned.txt

## sve2.md

      
              1 file
            
          
              14 forks
            
          
                44 comments
              
            
              93 stars
            
          
                zingaburga
                / sve2.md
            
            
              Last active
              February 7, 2026 19:51
            
              
                ARM’s Scalable Vector Extensions: A Critical Look at SVE2 For Integer Workloads
              
          
    ARM’s Scalable Vector Extensions: A Critical Look at SVE2 For Integer Workloads

Scalable Vector Extensions (SVE) is ARM’s latest SIMD extension to their instruction set, which was announced back in 2016.  A follow-up SVE2 extension was announced in 2019, designed to incorporate all functionality from ARM’s current primary SIMD extension, NEON (aka ASIMD).
Despite being announced 5 years ago, there is currently no generally available CPU which supports any form of SVE (which excludes the [Fugaku supercomputer](https://www.fujitsu.com/global/about/innovation/

  
## clpeak.txt
% ./clpeak
[mvk-info] MoltenVK version 1.1.5, supporting Vulkan version 1.1.189.
	The following 72 Vulkan extensions are supported:
		VK_KHR_16bit_storage v1
		VK_KHR_8bit_storage v1
		VK_KHR_bind_memory2 v1
		VK_KHR_create_renderpass2 v1
		VK_KHR_dedicated_allocation v3
		VK_KHR_depth_stencil_resolve v1
		VK_KHR_descriptor_update_template v1

## aarch64_amx.py
# IDA (disassembler) and Hex-Rays (decompiler) plugin for Apple AMX
#
# WIP research. (This was edited to add more info after someone posted it to
# Hacker News. Click "Revisions" to see full changes.)
#
# Copyright (c) 2020 dougallj


# Based on Python port of VMX intrinsics plugin:
# Copyright (c) 2019 w4kfu - Synacktiv

## QEMU_ON_M1.md

      
              2 files
            
          
              34 forks
            
          
                66 comments
              
            
              206 stars
            
          
                citruz
                / QEMU_ON_M1.md
            
            
              Last active
              May 3, 2025 17:10
            
              
                Create Ubuntu and Windows VMs with QEMU on Apple Silicon
              
          
    Running Linux and Windows on M1 with QEMU


30.11.2020: Updated with the new patchseries and instructions for Windows


02.12.2020: Added tweaks


08.12.2020: Updated with patchseries v4


31.01.2020: Updated with patchseries v6


## README.en.md

      
              2 files
            
          
              28 forks
            
          
                116 comments
              
            
              254 stars
            
          
                niw
                / README.en.md
            
            
              Last active
              August 28, 2025 18:24
            
              
                How to run Windows 10 on ARM or Ubuntu for ARM64 in QEMU on Apple Silicon Mac
              
          
    How to run Windows 10 on ARM or Ubuntu for ARM64 in QEMU on Apple Silicon Mac

Here is easy steps to try Windows 10 on ARM or Ubuntu for ARM64
on your Apple Silicon Mac. Enjoy!

NOTE: that this is current, 10/1/2021 state.

Running Windows 10 on ARM


Install Xcode from App Store or install Command Line Tools on your Mac


## isa.txt
platform: 7.5
ext: 7p5
name: HSW
1 add add 0x40 Addition
    0xfc0 u8 i8 u16 i16 u32 i32 , 0xfc0 u8 i8 u16 i16 u32 i32
    0x20000 f32 , 0xfc0 u8 i8 u16 i16 u32 i32
    0x20000 f32 , 0x20000 f32
    0x40000 f64 , 0x40000 f64
3 addc addc 0x4e Addition with Carry
    0x400 u32 , 0x400 u32
Device	Flash	Download	SHA-256 Checksum
Pixel 6 (oriole)	Link	Link	e0a63e51f9cca5307d515852039628f4ecb394d61f9a29978e7043804027627e
Pixel 6 Pro (raven)	Link	Link	54aa99e72bf855311e54d0758aaea8ccd456ff8fb72d7817bde7f4f355274a6c
Pixel 6a (bluejay)	Link	Link	964af292ca1977a089a87d327fb2b0a4d44a3e0e274987622
	(tf) root@rocm:~/tmp# python benchmark.py
	2023-10-14 15:02:22.116047: E external/local_xla/xla/stream_executor/plugin_registry.cc:93] Invalid plugin kind specified: DNN
	2023-10-14 15:02:22.348480: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
	To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
	2023-10-14 15:02:23.756833: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
	2023-10-14 15:02:23.982269: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:787] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
	2023-10-14 15:02:23.9823
	// Created by Timothy Davison on 2023-06-21.
	//
	// This is a Metal implementation of subgroupPartitionNV. You use it to find a mask of
	// the other threads in a simd-group with the same value (a partition of the simd-group about
	// a set of values).
	//
	// Feel free to use this in your code. Please share any fixes or ideas to make it faster.
	//
	// Khronos docs on subgroup partitioning:
	// - https://github.com/KhronosGroup/GLSL/blob/master/extensions/nv/GL_NV_shader_subgroup_partitioned.txt
	% ./clpeak
	[mvk-info] MoltenVK version 1.1.5, supporting Vulkan version 1.1.189.
	The following 72 Vulkan extensions are supported:
	VK_KHR_16bit_storage v1
	VK_KHR_8bit_storage v1
	VK_KHR_bind_memory2 v1
	VK_KHR_create_renderpass2 v1
	VK_KHR_dedicated_allocation v3
	VK_KHR_depth_stencil_resolve v1
	VK_KHR_descriptor_update_template v1
	# IDA (disassembler) and Hex-Rays (decompiler) plugin for Apple AMX
	#
	# WIP research. (This was edited to add more info after someone posted it to
	# Hacker News. Click "Revisions" to see full changes.)
	#
	# Copyright (c) 2020 dougallj


	# Based on Python port of VMX intrinsics plugin:
	# Copyright (c) 2019 w4kfu - Synacktiv
	platform: 7.5
	ext: 7p5
	name: HSW
	1 add add 0x40 Addition
	0xfc0 u8 i8 u16 i16 u32 i32 , 0xfc0 u8 i8 u16 i16 u32 i32
	0x20000 f32 , 0xfc0 u8 i8 u16 i16 u32 i32
	0x20000 f32 , 0x20000 f32
	0x40000 f64 , 0x40000 f64
	3 addc addc 0x4e Addition with Carry
	0x400 u32 , 0x400 u32