Motherboard: Asus Pro WS WRX80E-SAGE SE WIFI
Card: Asus HYPER M.2 X16 GEN 4 CARD
NVMe: 4x Samsung SSD 980 PRO 1TB
OS: Linux fedora 5.16.12-200.fc35.x86_64
AER, advanced error reporting logs excessively:
| /* | |
| * SPDX-FileCopyrightText: Copyright (c) 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| * SPDX-License-Identifier: Apache-2.0 | |
| * | |
| * Licensed under the Apache License, Version 2.0 (the "License"); | |
| * you may not use this file except in compliance with the License. | |
| * You may obtain a copy of the License at | |
| * | |
| * http://www.apache.org/licenses/LICENSE-2.0 | |
| * |
| # Apply this config conditionally to all C files | |
| If: | |
| PathMatch: .*\.(c|h)$ | |
| CompileFlags: | |
| Compiler: /usr/bin/gcc | |
| --- | |
| # Apply this config conditionally to all C++ files | |
| If: |
| # Store interactive Python shell history in ~/.cache/python_history | |
| # instead of ~/.python_history. | |
| # | |
| # Create the following .config/pythonstartup.py file | |
| # and export its path using PYTHONSTARTUP environment variable: | |
| # | |
| # export PYTHONSTARTUP="${XDG_CONFIG_HOME:-$HOME/.config}/pythonstartup.py" | |
| import atexit | |
| import os |
| import atexit | |
| import ctypes | |
| import os | |
| import shlex | |
| import sys | |
| import tempfile | |
| CMD_C_TO_SO = '{compiler} -shared -o {output} {input} {libraries}' | |
| # /------------------------------------------\ | |
| # | don't forget to download the .tp file | | |
| # | and place it in the user's directory :› | | |
| # | | | |
| # | also install lolcat: | | |
| # | https://github.com/busyloop/lolcat | | |
| # \------------------------------------------/ | |
| alias test-passed='if [ "$?" -eq "0" ]; then lolcat ~/.tp -a -s 40 -d 2; fi;' |
| /* Copyright (c) 2018 Arvid Gerstmann. */ | |
| /* This code is licensed under MIT license. */ | |
| #ifndef AG_RANDOM_H | |
| #define AG_RANDOM_H | |
| class splitmix | |
| { | |
| public: | |
| using result_type = uint32_t; | |
| static constexpr result_type (min)() { return 0; } |
This is a short post that explains how to write a high-performance matrix multiplication program on modern processors. In this tutorial I will use a single core of the Skylake-client CPU with AVX2, but the principles in this post also apply to other processors with different instruction sets (such as AVX512).
Matrix multiplication is a mathematical operation that defines the product of
| #!/usr/bin/env bash | |
| # --slave /usr/bin/$1 $1 /usr/bin/$1-\${version} \\ | |
| function register_clang_version { | |
| local version=$1 | |
| local priority=$2 | |
| update-alternatives \ | |
| --install /usr/bin/llvm-config llvm-config /usr/bin/llvm-config-${version} ${priority} \ |
| from timeit import default_timer as time | |
| import numpy as np | |
| from numba import cuda | |
| import os | |
| os.environ['NUMBAPRO_LIBDEVICE']='/usr/lib/nvidia-cuda-toolkit/libdevice/' | |
| os.environ['NUMBAPRO_NVVM']='/usr/lib/x86_64-linux-gnu/libnvvm.so.3.1.0' | |
| import numpy | |
| import torch | |
| import ctypes |