All tests were performed in a docker container, using the image postgres:15.4.
All tests used the same test data, but with scrambled hash values.
Only deviation from postgresql default startup parameters were:
-c work_mem=2048MB
-c maintenance_work_mem=2048MB
-c max_wal_size=8GB
First test was with straight up 60M records, split into 3 different reciever's, using index lookups:
CREATE TABLE IF NOT EXISTS messages (
id BIGSERIAL,
parent VARCHAR(128),
sent INT NOT NULL,
sender VARCHAR(128) NOT NULL,
reciever VARCHAR(128) NOT NULL,
hash VARCHAR(128) NOT NULL
);
CREATE INDEX IF NOT EXISTS messages_reciever_idx ON messages ((reciever));Using a crude COUNT(*) query:
SELECT COUNT(*) FROM messages WHERE reciever=%sThis took 2.49 seconds, and used up 27 GB of disk space.
This test utilized the table partitioning.
CREATE TABLE IF NOT EXISTS messages (
id BIGSERIAL,
parent VARCHAR(128),
sent INT NOT NULL,
sender VARCHAR(128) NOT NULL,
reciever VARCHAR(128) NOT NULL,
hash VARCHAR(128) NOT NULL
) PARTITION BY list(reciever);And each type of reciever got their own partition:
CREATE TABLE IF NOT EXISTS {reciever} PARTITION OF messages FOR VALUES IN ('reciever')Same query getting COUNT(*) for one particular reciever.
This took 0.95 seconds, and used up 27 GB of space.
Last run was using the inheritance capabilities of postgresql tables. This one gets a little muddier as you'll need a TRIGGER on insert, that routes the incoming data to the appropriate table.
CREATE TABLE IF NOT EXISTS messages (
id BIGSERIAL,
parent VARCHAR(128),
sent INT NOT NULL,
pubkey VARCHAR(128) NOT NULL,
reciever VARCHAR(128) NOT NULL,
hash VARCHAR(128) NOT NULL
);
CREATE TABLE IF NOT EXISTS messages_zzzzzzz (
CHECK ( reciever = 'zzzzzzz' )
) INHERITS (messages);
CREATE INDEX IF NOT EXISTS messages_reciever_zzzzzzz ON messages_zzzzzzz (reciever);
CREATE TABLE IF NOT EXISTS messages_yyyyyyy (
CHECK ( reciever = 'yyyyyyy' )
) INHERITS (messages);
CREATE INDEX IF NOT EXISTS messages_reciever_yyyyyyy ON messages_yyyyyyy (reciever);
CREATE TABLE IF NOT EXISTS messages_ggggggg (
CHECK ( reciever = 'ggggggg' )
) INHERITS (messages);
CREATE INDEX IF NOT EXISTS messages_reciever_ggggggg ON messages_ggggggg (reciever);
CREATE OR REPLACE FUNCTION messages_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.reciever = 'zzzzzzz' ) THEN INSERT INTO messages_zzzzzzz VALUES (NEW.*);
ELSIF ( NEW.reciever = 'yyyyyyy' ) THEN INSERT INTO messages_yyyyyyy VALUES (NEW.*);
ELSIF ( NEW.reciever = 'ggggggg' ) THEN INSERT INTO messages_ggggggg VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'No room with this name. Fix the messages_insert_trigger() function!';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
CREATE OR REPLACE TRIGGER insert_messages_trigger
BEFORE INSERT ON messages
FOR EACH ROW EXECUTE FUNCTION messages_insert_trigger();All it does is really create 4 tables:
messagesmessages_zzzzzzzmessages_yyyyyyymessages_ggggggg
With a hook that calls messages_insert_trigger() on each insert into messages.
Then we do our COUNT(*) call again.
This took 0.45 seconds and used up 27 GB of disk space.
| Method | Query speed | Space used |
|---|---|---|
| Index | 2.49 sec | 27 GB |
| Partition | 0.9 sec | 27 GB |
| Inheritance | 0.45 sec | 27 GB |