Created
October 12, 2025 19:17
-
-
Save akpoff/059939a7aedafd899c65e8d6aeac5e68 to your computer and use it in GitHub Desktop.
Using SQLite to Generate Uniformly-Distributed, Random Values
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| -- gen_data.sql | |
| -- Test SQLite PRNG random value generator and analyze with dieharder | |
| -- https://webhome.phy.duke.edu/~rgb/General/dieharder.php | |
| -- Execute this file with `cat gen_data| sqlite3 | tr -d '\n' | xxd -r -p > data.bin` | |
| -- It's easier to get SQLite to send the data to ASCII first line by line, then | |
| -- convert to binary for use with dieharder | |
| -- Analyze with `dieharder -g 201 -f data.bin -a` | |
| -- Set mode, disable headers and separator between columns | |
| .mode list | |
| .headers off | |
| .separator "" | |
| -- Set the number of 8-byte values you want with power(2, <count>) | |
| -- Warning: data.bin will be 8 x power(2, <value>) since we're generating 8-byte values | |
| -- We prefix with hex notation so xxd(1) recognizes we're using 64-bit values | |
| SELECT '0x' || hex(randomblob(8)) FROM generate_series(1, power(2,32)); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| -- Generate 256 random, 8-byte values in 3 columns | |
| -- Requires generate_series extension which is in the cli | |
| -- .output <filename> | |
| SELECT | |
| IIF(value < 86, lower(hex(randomblob(8))) || ',', lower(hex(randomblob(8)))) AS col1, | |
| IIF(value < 86, lower(hex(randomblob(8))) || ',', NULL) AS col2, | |
| IIF(value < 86, lower(hex(randomblob(8))) || ',', NULL) AS col3 | |
| FROM generate_series(0, 86); |
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I needed 256 random, uniformly distributed, 8-byte values for a project. I'm an incorrigible sql head so I used SQLite to generate them, but are they truly random and uniformly distributed? Can we trust SQLite's random() and randomblob(n) functions?
Yes we can, if we're not using the data for cryptography. SQLite's PRNG uses ChaCha20 for generating pseudomrandom numbers. It's seeded once per session with 44 bytes of data from /dev/urandom. As seeded PRNG, the values are predictable if you know the seed, but again, we're not using these value for cryptography. Still, let's see how random and uniform they are.
To test the data for uniformity of distribution, I generated 2^32 8-byte values using SQLite in one session, yielding 32 GiB of data, and then tested them with dieharder, the random-number test suite. (Grok found this for me.)
Why so much data, because dieharder is meant for big datasets. If you don't feed it enough values, it will rewind the file and reuse existing values. Running it over my dinky 256 values doesn't give it enough input to analyze. 32 GiB also lets us test whether it's possible to run it to exhaustion. 32 GiB isn't enough to exhaust it, but if it did fail, that would be a bad sign.
Dieharder gave the SQLite PRNG data mostly passing marks. Of 114 tests, 109 passed with 5 marked as weak, and those were often one variant of a test it ran with multiple criteria, and that was from one PRNG session.
On the whole, I'd say it's good.
Note: SQLite's PRNG suppresses the value -9223372036854775808 because if you use abs() on it "throws an integer overflow error since there is no equivalent positive 64-bit two complement value."
https://x.com/akpoff/status/1977468564262293630