Skip to content

Instantly share code, notes, and snippets.

@sin3point14
Created June 25, 2025 07:22
Show Gist options
  • Select an option

  • Save sin3point14/4b11f7babd6d90bda18aa5e8fd0072a6 to your computer and use it in GitHub Desktop.

Select an option

Save sin3point14/4b11f7babd6d90bda18aa5e8fd0072a6 to your computer and use it in GitHub Desktop.
Eigen cuda inverse test
#include <Eigen/Dense>
#include <iostream>
using mat = Eigen::Matrix<float, 5, 5>;
template <typename MatrixType>
__device__ void printCudaMatrix(const MatrixType& matrix) {
for (int row = 0; row < matrix.rows(); ++row)
{
printf(" Row %3d: [", row);
for (int col = 0; col < matrix.cols(); ++col)
{
printf("%15.8e ", (double)matrix(row, col));
}
printf("]\n");
}
}
__global__ void matrixInv(mat* A) {
printf("kernel A:\n");
printCudaMatrix(*A);
mat B = (*A).inverse();
printf("kernel B:\n");
printCudaMatrix(B);
*A = B;
printf("kernel Ainv:\n");
printCudaMatrix(*A);
}
int main() {
srand(0);
mat* d_A;
mat h_A = mat::Random();
std::cout << "CPU:\n" << h_A << std::endl;
cudaMalloc((void**)&d_A, sizeof(mat));
cudaMemcpy(d_A, &h_A, sizeof(mat), cudaMemcpyHostToDevice);
matrixInv<<<1, 1>>>(d_A);
cudaDeviceSynchronize();
mat h_Ainv;
cudaMemcpy(&h_Ainv, d_A, sizeof(mat), cudaMemcpyDeviceToHost);
std::cout << "CPU final inv:\n" << h_A.inverse() << std::endl;
std::cout << "GPU final inv:\n" << h_Ainv << std::endl;
cudaFree(d_A);
return 0;
}
@sin3point14
Copy link
Author

Compiling with clang++ however seems to throw an error as expected. I used the trunk sm_100a CUDA-12.8.1 compiler with -O2. Somehow adding --cuda-gpu-arch=sm_86 throws some error so i just skipped it.

In file included from <source>:1:
In file included from /opt/compiler-explorer/libs/eigen/v3.4.0/Eigen/Dense:2:
In file included from /opt/compiler-explorer/libs/eigen/v3.4.0/Eigen/LU:39:
/opt/compiler-explorer/libs/eigen/v3.4.0/Eigen/src/LU/InverseImpl.h:28:21: error: reference to __host__ function 'partialPivLu' in __host__ __device__ function
   28 |     result = matrix.partialPivLu().inverse();
      |                     ^
...

As expected clang++ refuses to compile iterative functions on the GPU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment