Last active
August 19, 2025 07:04
-
-
Save moon-chilled/30f694ccc0500d5df27b521132136dc3 to your computer and use it in GitHub Desktop.
parallel 512-bit full adder in avx512. pseudocode. probably works
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| // we want to compute r,c := x + y + i | |
| // c carry-out | |
| // r sum | |
| // x y addends | |
| // i carry-in | |
| // carry in/out is zmm with most significant qword -1/0 | |
| t := vpaddq(x, y) | |
| ct := vmovm2q(ctm := vpcmpeqq(t, -1)) // carry-through | |
| c# := vmovm2q(c#m := vpcmpnltuq(t, vpmaxuq(x,y))) // approximate carry (inverted) | |
| // now, to element t_i, we want to add 1-c#_i-1, unless ct_i-1, in which case we take c#_i-2, etc. | |
| // (that means t_0 + 1-c#_-1, which we can arrange to be the carry-in with perm2q) | |
| // so for each element, we want to count the number of ct immediately preceding it | |
| // preceding mask | |
| pm := vperm2b(c#, -1 0 0 0 0 0 0 0, 64 64 64 64 64 64 64 64 | |
| 64 64 64 64 64 64 64 0 | |
| 64 64 64 64 64 64 0 8 | |
| 64 64 64 64 64 0 8 16 | |
| 64 64 64 64 0 8 16 24 | |
| 64 64 64 0 8 16 24 32 | |
| 64 64 0 8 16 24 32 40 | |
| 64 0 8 16 24 32 40 48) | |
| // pm could probably also be built by moving c#m to a gpr, broadcast to zmm, shift each lane | |
| // preceding count | |
| pc := vpshrq(vplzcntq(pm), 3) | |
| r := vpsubq(vpaddq(t, 1), vperm2q(c#, i, vpsubq(-1 0 1 2 3 4 5 6, pc))) | |
| // carry-out needs masked to handle the case x[7]=y[7]=-1 and carry-in | |
| // we only care about the most significant qword but this is the easiest way to get it | |
| // using c#m as dst is just convenient because it's dead here; we just need something with a 0 in the msb | |
| // i think this is right w/e | |
| c := vmovm2q(vpcmpnltuq{c#m}(c#m, r, vpmaxuq(x,y))) | |
| // is there a way to skip >>3 in pc? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment