Ristovski/vkperf_RADV_gfx90c

## vkperf_RADV_gfx90c
vkperf (0.99.5) tests various performance characteristics of Vulkan devices.

Devices in the system:
   AMD Radeon Graphics (RADV RENOIR)
   NVIDIA GeForce RTX 4070 Ti SUPER
   llvmpipe (LLVM 19.1.7, 256 bits)

Selected device:
   AMD Radeon Graphics (RADV RENOIR)

VendorID:  0x1002 (AMD/ATI)
DeviceID:  0x1638
Vulkan version:  1.4.305
Driver version:  25.0.5 (104857605, 0x6400005)
   Driver name:  radv
   Driver info:  Mesa 25.0.5
   DriverID:     MesaRadv
   Driver conformance version:  1.4.0.0
GPU memory:  10GiB  (10718MiB)
Max memory allocations:  4294967295
Standard (non-sparse) buffer alignment:  16
Number of triangles for tests:  100000
Sparse mode for tests:  None
Timestamp number of bits:  64
Timestamp period:  10ns
Vulkan Instance version:  1.4.328
Operating system:  < unknown, non-Windows >
Processor:  AMD Ryzen 7 5700G with Radeon Graphics

Triangle throughput:
   Triangle list (triangle list primitive type,
      single per-scene vkCmdDraw() call, attributeless,
      constant VS output):                     759.6 mega-triangles/s
   Indexed triangle list (triangle list primitive type, single
      per-scene vkCmdDrawIndexed() call, no vertices shared between triangles,
      attributeless, constant VS output):      758.4 mega-triangles/s
   Indexed triangle list that reuses two indices of the previous triangle
      (triangle list primitive type, single per-scene vkCmdDrawIndexed() call,
      attributeless, constant VS output):      1.262 giga-triangles/s
   Triangle strips of various lengths
      (per-strip vkCmdDraw() call, 1 to 1000 triangles per strip,
      attributeless, constant VS output):
         strip length 1:    70.63 mega-triangles/s
         strip length 2:    139.7 mega-triangles/s
         strip length 5:    345.9 mega-triangles/s
         strip length 8:    541.2 mega-triangles/s
         strip length 10:   666.6 mega-triangles/s
         strip length 20:   1.322 giga-triangles/s
         strip length 25:   1.495 giga-triangles/s
         strip length 40:   1.798 giga-triangles/s
         strip length 50:   1.872 giga-triangles/s
         strip length 100:  2.039 giga-triangles/s
         strip length 125:  2.076 giga-triangles/s
         strip length 1000: 2.216 giga-triangles/s
   Indexed triangle strips of various lengths
      (per-strip vkCmdDrawIndexed() call, 1-1000 triangles per strip,
      no vertices shared between strips, each index used just once,
      attributeless, constant VS output):
         strip length 1:    70.78 mega-triangles/s
         strip length 2:    140.1 mega-triangles/s
         strip length 5:    346.6 mega-triangles/s
         strip length 8:    543.1 mega-triangles/s
         strip length 10:   668.9 mega-triangles/s
         strip length 20:   1.326 giga-triangles/s
         strip length 25:   1.626 giga-triangles/s
         strip length 40:   2.027 giga-triangles/s
         strip length 50:   2.140 giga-triangles/s
         strip length 100:  2.140 giga-triangles/s
         strip length 125:  2.173 giga-triangles/s
         strip length 1000: 2.214 giga-triangles/s
   Primitive restart indexed triangle strips of various lengths
      (single per-scene vkCmdDrawIndexed() call, 1-1000 triangles per strip,
      no vertices shared between strips, each index used just once,
      attributeless, constant VS output):
         strip length 1:    957.4 mega-triangles/s
         strip length 2:    1.508 giga-triangles/s
         strip length 5:    2.200 giga-triangles/s
         strip length 8:    2.202 giga-triangles/s
         strip length 1000: 2.202 giga-triangles/s
   Primitive restart, each triangle is replaced by one -1
      (single per-scene vkCmdDrawIndexed() call,
      no fragments produced):                  3.654 giga-triangles/s
   Primitive restart, only zeros in the index buffer
      (single per-scene vkCmdDrawIndexed() call,
      no fragments produced):                  756.2 mega-triangles/s
   Instancing throughput of vkCmdDraw()
      (one triangle per instance, constant VS output, one draw call,
      attributeless):                          759.4 mega-triangles/s
   Instancing throughput of vkCmdDrawIndexed()
      (one triangle per instance, constant VS output, one draw call,
      attributeless):                          758.4 mega-triangles/s
   Instancing throughput of vkCmdDrawIndirect()
      (one triangle per instance, one indirect draw call,
      one indirect record, attributeless:      755.5 mega-triangles/s
   Instancing throughput of vkCmdDrawIndexedIndirect()
      (one triangle per instance, one indirect draw call,
      one indirect record, attributeless:      754.8 mega-triangles/s
   vkCmdDraw() throughput
      (per-triangle vkCmdDraw() in command buffer,
      attributeless, constant VS output):      70.64 mega-triangles/s
   vkCmdDrawIndexed() throughput
      (per-triangle vkCmdDrawIndexed() in command buffer,
      attributeless, constant VS output):      70.75 mega-triangles/s
   VkDrawIndirectCommand processing throughput
      (per-triangle VkDrawIndirectCommand, one vkCmdDrawIndirect() call,
      attributeless):                          24.43 mega-indirectRecords/s
   VkDrawIndirectCommand processing throughput with stride 32
      (per-triangle VkDrawIndirectCommand, one vkCmdDrawIndirect() call,
      attributeless):                          24.43 mega-indirectRecords/s
   VkDrawIndexedIndirectCommand processing throughput
      (per-triangle VkDrawIndexedIndirectCommand,
      1x vkCmdDrawIndexedIndirect() call,
      attributeless):                          23.43 mega-indirectRecords/s
   VkDrawIndexedIndirectCommand processing throughput with stride 32
      (per-triangle VkDrawIndexedIndirectCommand,
      1x vkCmdDrawIndexedIndirect() call,
      attributeless):                          18.49 mega-indirectRecords/s

Vertex and geometry shader throughput:
   VS throughput using vkCmdDraw() - minimal VS that just writes
      constant output position (per-scene vkCmdDraw() call,
      no attributes, no fragments produced):   2.278 giga-vertices/s
   VS throughput using vkCmdDrawIndexed() - minimal VS that just writes
      constant output position (per-scene vkCmdDrawIndexed() call,
      no attributes, no fragments produced):   2.275 giga-vertices/s
   VS producing output position from VertexIndex and InstanceIndex
      using vkCmdDraw() (single per-scene vkCmdDraw() call,
      attributeless, no fragments produced):   2.278 giga-vertices/s
   VS producing output position from VertexIndex and InstanceIndex
      using vkCmdDrawIndexed() (single per-scene vkCmdDrawIndexed() call,
      attributeless, no fragments produced):   2.274 giga-vertices/s
   GS one triangle in and no triangle out
      (empty VS, attributeless):               759.4 mega-invocations/s
   GS one triangle in and single constant triangle out
      (empty VS, attributeless):               438.4 mega-invocations/s
   GS one triangle in and two constant triangles out
      (empty VS, attributeless):               316.9 mega-invocations/s

Attributes and buffers:
   One attribute performance - 1x vec4 attribute
      (attribute used, per-scene draw call):   2.239 giga-vertices/s
   One buffer performance - 1x vec4 buffer
      (1x read in VS, per-scene draw call):    2.237 giga-vertices/s
   One buffer performance - 1x vec3 buffer
      (1x read in VS, one draw call):          2.262 giga-vertices/s
   Two attributes performance - 2x vec4 attribute
      (both attributes used):                  1.518 giga-vertices/s
   Two buffers performance - 2x vec4 buffer
      (both buffers read in VS):               1.384 giga-vertices/s
   Two buffers performance - 2x vec3 buffer
      (both buffers read in VS):               2.024 giga-vertices/s
   Two interleaved attributes performance - 2x vec4
      (2x vec4 attribute fetched from the single buffer in VS
      from consecutive buffer locations:       1.507 giga-vertices/s
   Two interleaved buffers performance - 2x vec4
      (2x vec4 fetched from the single buffer in VS
      from consecutive buffer locations:       1.508 giga-vertices/s
   Packed buffer performance - 1x buffer using 32-byte struct unpacked
      into position+normal+color+texCoord:     1.502 giga-vertices/s
   Packed attribute performance - 2x uvec4 attribute unpacked
      into position+normal+color+texCoord:     1.524 giga-vertices/s
   Packed buffer performance - 2x uvec4 buffers unpacked
      into position+normal+color+texCoord:     1.521 giga-vertices/s
   Packed buffer performance - 2x buffer using 16-byte struct unpacked
      into position+normal+color+texCoord:     1.541 giga-vertices/s
   Packed buffer performance - 2x buffer using 16-byte struct
      read multiple times and unpacked
      into position+normal+color+texCoord:     1.529 giga-vertices/s
   Four attributes performance - 4x vec4 attribute
      (all attributes used):                   790.1 mega-vertices/s
   Four buffers performance - 4x vec4 buffer
      (all buffers read in VS):                788.8 mega-vertices/s
   Four buffers performance - 4x vec3 buffer
      (all buffers read in VS):                1.041 giga-vertices/s
   Four interleaved attributes performance - 4x vec4
      (4x vec4 fetched from the single buffer
      on consecutive locations:                787.8 mega-vertices/s
   Four interleaved buffers performance - 4x vec4
      (4x vec4 fetched from the single buffer
      on consecutive locations:                790.8 mega-vertices/s
   Four attributes performance - 2x vec4 and 2x uint attribute
      (2x vec4f32 + 2x vec4u8, 2x conversion from vec4u8
      to vec4):                                1.250 giga-vertices/s

Transformations:
   Matrix performance - one matrix as uniform for all triangles
      (maxtrix read in VS,
      coordinates in vec4 attribute):          2.236 giga-vertices/s
   Matrix performance - per-triangle matrix in buffer
      (different matrix read for each triangle in VS,
      coordinates in vec4 attribute):          1.255 giga-vertices/s
   Matrix performance - per-triangle matrix in attribute
      (triangles are instanced and each triangle receives a different matrix,
      coordinates in vec4 attribute:           2.068 giga-vertices/s
   Matrix performance - one matrix in buffer for all triangles and 2x uvec4
      packed attributes (each triangle reads matrix from the same place in
      the buffer, attributes unpacked):        1.384 giga-vertices/s
   Matrix performance - per-triangle matrix in the buffer and 2x uvec4 packed
      attributes (each triangle reads a different matrix from a buffer,
      attributes unpacked):                    888.4 mega-vertices/s
   Matrix performance - per-triangle matrix in buffer and 2x uvec4 packed
      buffers (each triangle reads a different matrix from a buffer,
      packed buffers unpacked):                930.4 mega-vertices/s
   Matrix performance - GS reads per-triangle matrix from buffer and 2x uvec4
      packed buffers (each triangle reads a different matrix from a buffer,
      packed buffers unpacked in GS):          625.5 mega-vertices/s
   Matrix performance - per-triangle matrix in buffer and four attributes
      (each triangle reads a different matrix from a buffer,
      4x vec4 attribute):                      593.7 mega-vertices/s
   Matrix performance - 1x per-triangle matrix in buffer, 2x uniform matrix and
      and 2x uvec4 packed attributes (uniform view and projection matrices
      multiplied with per-triangle model matrix and with unpacked attributes of
      position, normal, color and texCoord:    880.3 mega-vertices/s
   Matrix performance - 2x per-triangle matrix (mat4+mat3) in buffer,
      3x uniform matrix (mat4+mat4+mat3) and 2x uvec4 packed attributes
      (full position and normal computation with MVP and normal matrices,
      all matrices and attributes multiplied): 678.5 mega-vertices/s
   Matrix performance - 2x per-triangle matrix (mat4+mat3) in buffer,
      2x non-changing matrix (mat4+mat4) in push constants,
      1x constant matrix (mat3) and 2x uvec4 packed attributes (all
      matrices and attributes multiplied):     668.5 mega-vertices/s
   Matrix performance - 2x per-triangle matrix (mat4+mat3) in buffer, 2x
      non-changing matrix (mat4+mat4) in specialization constants, 1x constant
      matrix (mat3) defined by VS code and 2x uvec4 packed attributes (all
      matrices and attributes multiplied):     693.0 mega-vertices/s
   Matrix performance - 2x per-triangle matrix (mat4+mat3) in buffer,
      3x constant matrix (mat4+mat4+mat3) defined by VS code and
      2x uvec4 packed attributes (all matrices and attributes
      multiplied):                             697.4 mega-vertices/s
   Matrix performance - GS five matrices processing, 2x per-triangle matrix
      (mat4+mat3) in buffer, 3x uniform matrix (mat4+mat4+mat3) and
      2x uvec4 packed attributes passed through VS (all matrices and
      attributes multiplied):                  509.2 mega-vertices/s
   Matrix performance - GS five matrices processing, 2x per-triangle matrix
      (mat4+mat3) in buffer, 3x uniform matrix (mat4+mat4+mat3) and
      2x uvec4 packed data read from buffer in GS (all matrices and attributes
      multiplied):                             521.1 mega-vertices/s
   Textured Phong and Matrix performance - 2x per-triangle matrix
      in buffer (mat4+mat3), 3x uniform matrix (mat4+mat4+mat3) and
      four attributes (vec4f32+vec3f32+vec4u8+vec2f32),
      no fragments produced:                   618.3 mega-vertices/s
   Textured Phong and Matrix performance - 1x per-triangle matrix
      in buffer (mat4), 2x uniform matrix (mat4+mat4) and
      four attributes (vec4f32+vec3f32+vec4u8+vec2f32),
      no fragments produced:                   807.6 mega-vertices/s
   Textured Phong and Matrix performance - 1x per-triangle matrix
      in buffer (mat4), 2x uniform matrix (mat4+mat4) and 2x uvec4 packed
      attribute, no fragments produced:        866.7 mega-vertices/s
   Textured Phong and Matrix performance - 1x per-triangle row-major matrix
      in buffer (mat4), 2x uniform not-row-major matrix (mat4+mat4),
      2x uvec4 packed attributes,
      no fragments produced:                   949.0 mega-vertices/s
   Textured Phong and Matrix performance - 1x per-triangle mat4x3 matrix
      in buffer, 2x uniform matrix (mat4+mat4) and 2x uvec4 packed attributes,
      no fragments produced:                   1.045 giga-vertices/s
   Textured Phong and Matrix performance - 1x per-triangle row-major mat4x3
      matrix in buffer, 2x uniform matrix (mat4+mat4), 2x uvec4 packed
      attribute, no fragments produced:        1.045 giga-vertices/s
   Textured Phong and PAT performance - PAT v1 (Position-Attitude-Transform,
      performing translation (vec3) and rotation (quaternion as vec4) using
      implementation 1), PAT is per-triangle 2x vec4 in buffer,
      2x uniform matrix (mat4+mat4), 2x uvec4 packed attributes,
      no fragments produced:                   1.176 giga-vertices/s
   Textured Phong and PAT performance - PAT v2 (Position-Attitude-Transform,
      performing translation (vec3) and rotation (quaternion as vec4) using
      implementation 2), PAT is per-triangle 2x vec4 in buffer,
      2x uniform matrix (mat4+mat4), 2x uvec4 packed attributes,
      no fragments produced:                   1.177 giga-vertices/s
   Textured Phong and PAT performance - PAT v3 (Position-Attitude-Transform,
      performing translation (vec3) and rotation (quaternion as vec4) using
      implementation 3), PAT is per-triangle 2x vec4 in buffer,
      2x uniform matrix (mat4+mat4), 2x uvec4 packed attributes,
      no fragments produced:                   1.174 giga-vertices/s
   Textured Phong and PAT performance - constant single PAT v2 sourced from
      the same index in buffer (vec3+vec4), 2x uniform matrix (mat4+mat4),
      2x uvec4 packed attributes,
      no fragments produced:                   1.404 giga-vertices/s
   Textured Phong and PAT performance - indexed draw call, per-triangle PAT v2
      in buffer (vec3+vec4), 2x uniform matrix (mat4+mat4), 2x uvec4 packed
      attribute, no fragments produced:        1.089 giga-vertices/s
   Textured Phong and PAT performance - indexed draw call, constant single
      PAT v2 sourced from the same index in buffer (vec3+vec4),
      2x uniform matrix (mat4+mat4), 2x uvec4 packed attributes,
      no fragments produced:                   1.296 giga-vertices/s
   Textured Phong and PAT performance - primitive restart, indexed draw call,
      per-triangle PAT v2 in buffer (vec3+vec4), 2x uniform matrix (mat4+mat4),
      2x uvec4 packed attributes,
      no fragments produced:                   1.119 giga-vertices/s
   Textured Phong and PAT performance - primitive restart, indexed draw call,
      constant single PAT v2 sourced from the same index in buffer (vec3+vec4),
      2x uniform matrix (mat4+mat4), 2x uvec4 packed attributes,
      no fragments produced:                   1.317 giga-vertices/s
   Textured Phong and double precision matrix performance - double precision
      per-triangle matrix in buffer (dmat4), double precision per-scene view
      matrix in uniform (dmat4), both matrices converted to single precision
      before computations, single precision per-scene perspective matrix in
      uniform (mat4), single precision vertex positions, packed attributes
      (2x uvec4), no fragments produced:       676.3 mega-vertices/s
   Textured Phong and double precision matrix performance - double precision
      per-triangle matrix in buffer (dmat4), double precision per-scene view
      matrix in uniform (dmat4), both matrices multiplied in double precision,
      single precision vertex positions, single precision per-scene
      perspective matrix in uniform (mat4), packed attributes (2x uvec4),
      no fragments produced:                   581.8 mega-vertices/s
   Textured Phong and double precision matrix performance - double precision
      per-triangle matrix in buffer (dmat4), double precision per-scene view
      matrix in uniform (dmat4), both matrices multiplied in double precision,
      double precision vertex positions (dvec3), single precision per-scene
      perspective matrix in uniform (mat4), packed attributes (3x uvec4),
      no fragments produced:                   542.5 mega-vertices/s
   Textured Phong and double precision matrix performance using GS - double
      precision per-triangle matrix in buffer (dmat4), double precision
      per-scene view matrix in uniform (dmat4), both matrices multiplied in
      double precision, double precision vertex positions (dvec3), single
      precision per-scene perspective matrix in uniform (mat4), packed
      attributes (3x uvec4),
      no fragments produced:                   222.0 mega-vertices/s

Fragment throughput:
   Single full-framebuffer quad,
      constant color FS:                       25.51 giga-fragments/s
   10x full-framebuffer quad,
      constant color FS:                       35.45 giga-fragments/s
   Four smooth interpolators (4x vec4),
      10x fullscreen quad:                     35.44 giga-fragments/s
   Four flat interpolators (4x vec4),
      10x fullscreen quad:                     35.47 giga-fragments/s
   Four textured phong interpolators (vec3+vec3+vec4+vec2),
      10x fullscreen quad:                     35.33 giga-fragments/s
   Textured Phong, packed uniforms (four smooth interpolators
      (vec3+vec3+vec4+vec2), 4x uniform (material (56 byte) +
      globalAmbientLight (12 byte) + light (64 byte) + sampler2D),
      10x fullscreen quad):                    8.501 giga-fragments/s
   Textured Phong, not packed uniforms (four smooth interpolators
      (vec3+vec3+vec4+vec2), 4x uniform (material (72 byte) +
      globalAmbientLight (12 byte) + light (80 byte) + sampler2D),
      10x fullscreen quad):                    8.500 giga-fragments/s
   Simplified Phong, no texture, no specular (2x smooth interpolator
      (vec3+vec3), 3x uniform (material (vec4+vec4) + globalAmbientLight
      (vec3) + light (48 bytes: position+attenuation+ambient+diffuse)),
      10x fullscreen quad):                    15.65 giga-fragments/s
   Simplified Phong, no texture, no specular, single uniform
      (2x smooth interpolator (vec3+vec3), 1x uniform
      (material+globalAmbientLight+light (vec4+vec4+vec4 + 3x vec4),
      10x fullscreen quad):                    15.64 giga-fragments/s
   Constant color from uniform, 1x uniform (vec4) in FS,
      10x fullscreen quad:                     35.38 giga-fragments/s
   Constant color from uniform, 1x uniform (uint) in FS,
      10x fullscreen quad:                     35.41 giga-fragments/s

Transfer throughput:
   Transfer of consecutive blocks:
      4 bytes: 101.212ns per transfer (0.0368068 GiB/s)
      4 bytes: 96.904ns per transfer (0.0384431 GiB/s)
      8 bytes: 108.368ns per transfer (0.0687526 GiB/s)
      16 bytes: 122.692ns per transfer (0.121452 GiB/s)
      32 bytes: 154.088ns per transfer (0.193411 GiB/s)
      64 bytes: 182.656ns per transfer (0.326322 GiB/s)
      128 bytes: 187.24ns per transfer (0.636666 GiB/s)
      256 bytes: 163.691ns per transfer (1.45651 GiB/s)
      512 bytes: 169.15ns per transfer (2.81901 GiB/s)
      1024 bytes: 181.133ns per transfer (5.26506 GiB/s)
      2048 bytes: 205.391ns per transfer (9.28644 GiB/s)
      4096 bytes: 158.984ns per transfer (23.9942 GiB/s)
      8192 bytes: 313.594ns per transfer (24.3289 GiB/s)
      16384 bytes: 621.875ns per transfer (24.5367 GiB/s)
      32768 bytes: 1243.12ns per transfer (24.5491 GiB/s)
      65536 bytes: 2473.75ns per transfer (24.6731 GiB/s)
      131072 bytes: 4970ns per transfer (24.5614 GiB/s)
      262144 bytes: 9900ns per transfer (24.6607 GiB/s)
      524288 bytes: 19830ns per transfer (24.6234 GiB/s)
      1048576 bytes: 39620ns per transfer (24.6482 GiB/s)
      2097152 bytes: 79280ns per transfer (24.6358 GiB/s)
   Transfer of spaced blocks:
      4 bytes: 101.916ns per transfer (0.0365526 GiB/s)
      4 bytes: 102.104ns per transfer (0.0364853 GiB/s)
      8 bytes: 115.428ns per transfer (0.0645474 GiB/s)
      16 bytes: 143.096ns per transfer (0.104134 GiB/s)
      32 bytes: 143.728ns per transfer (0.207352 GiB/s)
      64 bytes: 145.828ns per transfer (0.408733 GiB/s)
      128 bytes: 157.236ns per transfer (0.758155 GiB/s)
      256 bytes: 161.997ns per transfer (1.47175 GiB/s)
      512 bytes: 167.139ns per transfer (2.85294 GiB/s)
      1024 bytes: 185.586ns per transfer (5.13872 GiB/s)
      2048 bytes: 218.594ns per transfer (8.72554 GiB/s)
      4096 bytes: 161.797ns per transfer (23.5771 GiB/s)
      8192 bytes: 312.812ns per transfer (24.3897 GiB/s)
      16384 bytes: 625.625ns per transfer (24.3897 GiB/s)
      32768 bytes: 1251.88ns per transfer (24.3775 GiB/s)
      65536 bytes: 2508.75ns per transfer (24.3289 GiB/s)
      131072 bytes: 5012.5ns per transfer (24.3532 GiB/s)
      262144 bytes: 9935ns per transfer (24.5738 GiB/s)
      524288 bytes: 19740ns per transfer (24.7356 GiB/s)
      1048576 bytes: 39580ns per transfer (24.6731 GiB/s)
      2097152 bytes: 78960ns per transfer (24.7356 GiB/s)

Measurement statistics:
   Triangle throughput measurement time:  3.05 seconds using 288 test rounds.
   Vertex throughput measurement time:    0.349 seconds using 288 test rounds.
   Attribute and Buffer measurement time: 1.27 seconds using 288 test rounds.
   Transformation measurement time:       3.65 seconds using 288 test rounds.
   Fragment throughput measurement time:  3.21 seconds using 288 test rounds.
   Transfer throughput measurement time:  7.32 seconds using 288 test rounds.
   Total device time: 18.5 seconds.
   Total real time:   20 seconds.
No results found