Matrix multiplication project is available with the VTune Profiler application.
-
Open and create project
-
Initial measurements - Performance Snapshot
-
Hotspots, inspect bottom-up, source and flamegraph
-
Memory Access issues
-
Update the algorithm -
multiplication2, check optimization levels (-O1) andmake -
Check Performance Snapshot
-
Increase optimization level to
-O3- auto vectorization forg++andmake -
Check Performance Snapshot and HPC Characterization
-
Enable AVX512 vectorization - in my case for Skylake architecture add CXXFLAG
-march=skylake-avx512(choose appropriate for your CPU) andmake -
Check Performance Snapshot and Micro Architecture Analysis
-
Compare results