Skip to content

Instantly share code, notes, and snippets.

@greggirwin
Last active November 4, 2025 23:48
Show Gist options
  • Select an option

  • Save greggirwin/d990fff8cf55d2aaa53bd23a0e47acde to your computer and use it in GitHub Desktop.

Select an option

Save greggirwin/d990fff8cf55d2aaa53bd23a0e47acde to your computer and use it in GitHub Desktop.
Red [
title: "A simple flat-block db model."
author: "Gregg Irwin"
url: https://gist.github.com/greggirwin/d990fff8cf55d2aaa53bd23a0e47acde
notes: {
Mocked up in response to chat that came up based on `remove/key`
being case sensitive, and how Reducers can/should use flat blocks.
https://github.com/red/red/issues/5652
A key question is how much native funcs like `remove` and `put`
should be extended to support use with flat blocks, or whether
we should roll mezzanine wrappers which add more value, like
enforcing checks on the record size and storing the record size
in a way that it can be found, rather than using magic numbers
in code, even if behind var names.
A related issue is the number of refinements that may need to be
added to natives or actions, but are very easily propagated with
mezzanine code today, thanks to dynamic refinements.
This is about as basic as it gets, but adding specs for the schema
would be very easy to do, and that metadata would also be stored
in the DB object.
Flat blocks are more efficient, as every record doesn't have its
own container (block/map/object) that add overhead. They are simply
values. The big problem is that, without metadata, they are opaque,
harder to maintain over time, and more prone to error. This mockup
does NOT address *field* naming access, but that could also be
schema driven. Another use case is loading flat CSV data.
It also is NOT specifically a key-value store, where one (compound)
value is associated with each key.
With a simple metadata/spec addition, much can be driven by that,
and that metata can...should...MUST be included with the flat data
itself.
Update note: I added a hash to the db object, which holds only the
keys. For datasets under a few thousand records, using
only a block is perfectly functional, but as the number
of records grows, performance goes down dramatically,
almost entirely due to the lookup speed of FIND, which
is linear. Adding a hash for keys means a little extra
work, but the payoff is huge. For any real-world use,
the small amount of extra code and logic is well worth
it. For this gist, I left checks in place, so you can
make a DB object, then set the KEYS hash to none, or
just don't initialize it, so you can see the performance
difference. The next obvious performance booster would
be a batch insert for bulk records.
}
]
;-------------------------------------------------------------------------------
flat-block-rec-support-ctx: context [
make-rec-func: function [
"Make a function using a standard [db-object key] spec."
name [word! block!] "Function name(s) (will be set)"
doc-str [string!] "Function doc string"
body [block!] "Function body"
/with-rec "Include a rec arg in the spec, for add/put"
][
set name function compose [ ; set the name in the global context
(doc-str) ; compose the doc string into the spec
db [object!]
key
(either with-rec [[rec [block!] "Append as flat values."]][])
/case "Perform a case-sensitive search."
/same {Use "same?" as comparator.}
] body ; use the body directly
name ; return the name
]
make-rec-funcs: func [
"Make a group of [series key] functions; returns func names set"
spec [block!] "[name doc-string body] triples"
/with-rec "Include a rec arg in the spec, for add/put"
][
collect [
foreach [name doc-str body] spec [
;keep make-rec-func name doc-str body
keep make-rec-func/:with-rec name doc-str body
]
]
]
; The idea here is that these all have the same spec and short bodies,
; so we can eliminate a lot of boilerplate.
; Read access funcs have a spec of [db key /case /same]
specs: [
find-rec
{Returns the series where the key is found, or NONE if the key doesn't exist.}
[
either hash? db/keys [
pos: find/only/:case/:same db/keys :key
;if pos [at db/data db/rec-size * (subtract index? pos 1) + 1]
if pos [at db/data rec-pos db index? pos]
][
find/only/skip/:case/:same db/data :key db/rec-size
]
]
has-rec?
{Returns true if key exists; false otherwise.}
[not none? find-rec/:case/:same db :key]
get-rec
{Returns the record associated with a key, exlcuding the key itself, or NONE.}
[if pos: find-rec/:case/:same db :key [copy/part next pos db/rec-size - 1]]
remove-rec
{Removes the record for the given key; returns true if a record was removed, false if the key didn't exist.}
[
; We need the key to find it, so remove the record first, then the hashed key.
res: remove/part find-rec/:case/:same db :key db/rec-size
if hash? db/keys [remove find db/keys :key]
to logic! res
]
take-rec
{Removes and returns the record for the given key, including the key, or NONE if the key doesn't exist.}
[
; We need the key to find it, so remove the record first, then the hashed key.
if pos: find-rec/:case/:same db :key [res: take/part pos db/rec-size]
if hash? db/keys [remove find db/keys :key]
res
]
]
make-rec-funcs specs
; ;?? Is using the same method as other func-makers better here?
; ; The bodies are bigger, but it does ensure specs are consistent.
; Write access funcs have a spec of [db key rec /case /same]
specs: [
put-rec
{Add or update a record.}
[
; Account for key being part of the record
if db/rec-size <> (add 1 length? rec) [
do make error! rejoin [
"Number of fields doesn't match record size. Expected " db/rec-size - 1
" values, but got " length? rec
". Rec: " mold rec
]
]
either pos: find-rec/:case/:same db :key [
; change/only is not used here, as the record values are not in a sub-block.
head change next pos :rec
][
if hash? db/keys [append/only db/keys :key]
append/only db/data :key
append db/data :rec
]
; new-line is a performance killer in benchmarking
;new-line/skip db/data on db/rec-size ; just for visual sanity when testing
db/data
]
[add-rec insert-rec] ; you can set more than one name
{Adds a new record; error if key already exists.}
[
if find-rec/:case/:same db :key [
do make error! rejoin ["A record with that key already exists. Key: " key]
]
put-rec/:case/:same db :key rec
]
[change-rec update-rec] ; you can set more than one name
{Update a record; error if key doesn't exist.}
[
if not find-rec/:case/:same db :key [
do make error! rejoin ["A record with that key does not exist. Key: " key]
]
put-rec/:case/:same db :key rec
]
]
make-rec-funcs/with-rec specs
; ;?? Is using the same method as other func-makers better here?
; ; The bodies are bigger, but it does ensure specs are consistent.
; make-rec-func/with-rec
; 'put-rec
; {Add or update a record.}
; [
; ; Account for key being part of the record
; if db/rec-size <> (1 + length? rec) [
; do make error! "Number of fields doesn't match record size."
; ]
; either pos: find-rec/:case/:same db :key [
; head change next pos :rec
; ][
; append db/data :key
; append db/data :rec
; ]
; new-line/skip db/data on db/rec-size ; just for visual sanity when testing
; ]
;
; make-rec-func/with-rec
; [add-rec insert-rec]
; {Adds a new record; error if key already exists.}
; [
; if find-rec/:case/:same db :key [
; do make error! rejoin ["A record with that key already exists. key: " key]
; ]
; put-rec/:case/:same db :key rec
; ]
;
; make-rec-func/with-rec
; [change-rec update-rec]
; {Update a record; error if key doesn't exist.}
; [
; if not find-rec/:case/:same db :key [
; do make error! rejoin ["A record with that key does not exist. key: " key]
; ]
; put-rec/:case/:same db :key rec
; ]
; ; put-rec has an extra arg (rec), so has a different spec.
; set [put-rec change-rec] function [
; {Add or update a record}
; db [object!]
; key
; rec [block!] "Append as flat values."
; /case "Perform a case-sensitive search."
; /same {Use "same?" as comparator.}
; ][
; ; Account for key being part of the record
; if db/rec-size <> (1 + length? rec) [
; do make error! "Number of fields doesn't match record size."
; ]
; either pos: find-rec/:case/:same db :key [
; head change next pos :rec
; ][
; append db/data :key
; append db/data :rec
; ]
; new-line/skip db/data on db/rec-size ; just for visual sanity when testing
; ]
; ; add-rec has an extra arg (rec), so has a different spec.
; set [add-rec insert-rec] function [ ; new append
; {Adds a new record.}
; db [object!]
; key
; rec [block!] "Append as flat values."
; /case "Perform a case-sensitive search."
; /same {Use "same?" as comparator.}
; ][
; if find-rec/:case/:same db :key [
; do make error! rejoin ["A record with that key already exists. key: " key]
; ]
; put-rec/:case/:same db :key rec
; ]
; at-rec doesn't take a key, but a record number
set 'at-rec function [
db [object!]
rec-num [integer!]
][
;at db/data (rec-num - 1) * db/rec-size + 1
at db/data rec-pos db rec-num
]
set 'rec-pos function [
db [object!]
rec-num [integer!]
][
(rec-num - 1) * db/rec-size + 1
]
]
;-------------------------------------------------------------------------------
flat-recs-db: context [
rec-size: none
keys: none ; hash! containing only keys
data: none ; flat block of record data
]
make-flat-recs-storage: func [
spec [map!] "rec-size: Record size (number of fields/values); alloc: number of records to allocate initially."
][
; TBD ensure rec-size and alloc are > 0
make flat-recs-db [
rec-size: spec/rec-size
keys: make hash! any [spec/alloc 100]
data: make block! spec/rec-size * any [spec/alloc 100]
]
]
;-------------------------------------------------------------------------------
expect-err: func [blk [block!] /local err][
if error? set/any 'err try blk [
print ["EXPECTED ERR:" mold err/arg1]
]
]
;-------------------------------------------------------------------------------
;!! Record key counts as part of the record size
db: make-flat-recs-storage #[rec-size: 4]
print ""
put-rec db 'rec-1 ['Alice #senior 1]
put-rec db 'rec-2 ['Bob #median 2]
put-rec db 'rec-3 ['Carol #junior 3]
print ["DB:" mold db]
print ['Has 'rec-1 tab has-rec? db 'rec-1]
print ['Has 'rec-2 tab has-rec? db 'rec-2]
print ['Has 'rec-3 tab has-rec? db 'rec-3]
print ['Has 'rec-4 tab has-rec? db 'rec-4]
print ""
print ['Get 'rec-1 tab mold get-rec db 'rec-1]
print ['Get 'rec-2 tab mold get-rec db 'rec-2]
print ['Get 'rec-3 tab mold get-rec db 'rec-3]
print ['Get 'rec-4 tab mold get-rec db 'rec-4]
print ""
print ['At 2 tab mold at-rec db 2]
print ""
print ['Take 'rec-2 tab mold take-rec db 'rec-2]
print mold db/keys
print ['Take 'rec-2 tab mold take-rec db 'rec-2]
print ["DB:" mold db]
print ['Add 'rec-2 mold add-rec db 'rec-2 ['Bob #median 2]]
expect-err [add-rec db 'rec-2 ['Bob #median 2]] ; dup key error
expect-err ['Change 'rec-4 mold change-rec db 'rec-4 ['Dan #doh! 4]] ; key doesn't exist error
print ['Remove 'rec-2 tab mold remove-rec db 'rec-2]
expect-err [add-rec db 'rec-2 ['Bob #median 2 xxx]] ; rec length wrong error
print ['Remove 'rec-2 tab mold remove-rec db 'rec-2]
print ['Add mold [rec x] mold add-rec db [rec x] ['Prof-X #super 2E10]]
print ["DB:" mold db]
print ['Get mold [rec x] tab mold get-rec db [rec x]]
;halt
; This is for Gregg's testing
do %/d/red/mezz/profile.red
n: 10'000
print ["N:" n]
db: make-flat-recs-storage #[rec-size: 4]
add-perf: [repeat i n [insert-rec db append copy "rec-" i compose ['Alice #senior (i)]]]
put-perf: [loop n [put-rec db append copy "put-rec-" n ['Alice #senior 1]]]
has-perf: [repeat i n [has-rec? db append copy "rec-" i]]
get-perf: [repeat i n [get-rec db append copy "rec-" i]]
find-perf: [repeat i n [find-rec db append copy "rec-" i]]
find-perf-x: [repeat i n [find-rec db "rec-1"]]
find-perf-xx: [repeat i n [find-rec db "rec-1000"]]
find-perf-xxx: [repeat i n [find-rec db "rec-10000"]]
load-mold-perf: [load mold db/data]
profile/show [
add-perf put-perf has-perf get-perf find-perf
find-perf-x find-perf-xx find-perf-xxx
load-mold-perf
]
halt
; With a hash of keys in place.
N: 10'000
Time | Time (Per) | Memory | Code
0:00:00.015 | 0:00:00.015 | 1708008 | load-mold-perf
0:00:00.026 | 0:00:00.026 | 284 | find-perf-x
0:00:00.027 | 0:00:00.027 | 284 | find-perf-xx
0:00:00.028 | 0:00:00.028 | 284 | find-perf-xxx
0:00:00.037 | 0:00:00.037 | 880284 | find-perf
0:00:00.042 | 0:00:00.042 | 880284 | has-perf
0:00:00.05 | 0:00:00.05 | 1800284 | get-perf
0:00:00.06 | 0:00:00.06 | 880440 | put-perf
0:00:00.102 | 0:00:00.102 | 1898080 | add-perf
N: 25'000
Time | Time (Per) | Memory | Code
0:00:00.049 | 0:00:00.049 | 3954936 | load-mold-perf
0:00:00.065 | 0:00:00.065 | 284 | find-perf-x
0:00:00.068 | 0:00:00.068 | 284 | find-perf-xx
0:00:00.072 | 0:00:00.072 | 284 | find-perf-xxx
0:00:00.092 | 0:00:00.092 | 1172444 | find-perf
0:00:00.132 | 0:00:00.132 | 250620 | get-perf
0:00:00.132 | 0:00:00.132 | 1171828 | has-perf
0:00:00.135 | 0:00:00.135 | 1171764 | put-perf
0:00:00.229 | 0:00:00.229 | 4548640 | add-perf
N: 100'000
Time | Time (Per) | Memory | Code
0:00:00.218 | 0:00:00.218 | 15817704 | load-mold-perf
0:00:00.263 | 0:00:00.263 | 284 | find-perf-x
0:00:00.269 | 0:00:00.269 | 284 | find-perf-xx
0:00:00.269 | 0:00:00.269 | 284 | find-perf-xxx
0:00:00.375 | 0:00:00.375 | 4516004 | find-perf
0:00:00.44 | 0:00:00.44 | 4515388 | has-perf
0:00:00.528 | 0:00:00.528 | 986460 | get-perf
0:00:00.581 | 0:00:00.581 | 2536820 | put-perf
0:00:00.922 | 0:00:00.922 | 15978140 | add-perf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment