Last active
November 4, 2025 23:48
-
-
Save greggirwin/d990fff8cf55d2aaa53bd23a0e47acde to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Red [ | |
| title: "A simple flat-block db model." | |
| author: "Gregg Irwin" | |
| url: https://gist.github.com/greggirwin/d990fff8cf55d2aaa53bd23a0e47acde | |
| notes: { | |
| Mocked up in response to chat that came up based on `remove/key` | |
| being case sensitive, and how Reducers can/should use flat blocks. | |
| https://github.com/red/red/issues/5652 | |
| A key question is how much native funcs like `remove` and `put` | |
| should be extended to support use with flat blocks, or whether | |
| we should roll mezzanine wrappers which add more value, like | |
| enforcing checks on the record size and storing the record size | |
| in a way that it can be found, rather than using magic numbers | |
| in code, even if behind var names. | |
| A related issue is the number of refinements that may need to be | |
| added to natives or actions, but are very easily propagated with | |
| mezzanine code today, thanks to dynamic refinements. | |
| This is about as basic as it gets, but adding specs for the schema | |
| would be very easy to do, and that metadata would also be stored | |
| in the DB object. | |
| Flat blocks are more efficient, as every record doesn't have its | |
| own container (block/map/object) that add overhead. They are simply | |
| values. The big problem is that, without metadata, they are opaque, | |
| harder to maintain over time, and more prone to error. This mockup | |
| does NOT address *field* naming access, but that could also be | |
| schema driven. Another use case is loading flat CSV data. | |
| It also is NOT specifically a key-value store, where one (compound) | |
| value is associated with each key. | |
| With a simple metadata/spec addition, much can be driven by that, | |
| and that metata can...should...MUST be included with the flat data | |
| itself. | |
| Update note: I added a hash to the db object, which holds only the | |
| keys. For datasets under a few thousand records, using | |
| only a block is perfectly functional, but as the number | |
| of records grows, performance goes down dramatically, | |
| almost entirely due to the lookup speed of FIND, which | |
| is linear. Adding a hash for keys means a little extra | |
| work, but the payoff is huge. For any real-world use, | |
| the small amount of extra code and logic is well worth | |
| it. For this gist, I left checks in place, so you can | |
| make a DB object, then set the KEYS hash to none, or | |
| just don't initialize it, so you can see the performance | |
| difference. The next obvious performance booster would | |
| be a batch insert for bulk records. | |
| } | |
| ] | |
| ;------------------------------------------------------------------------------- | |
| flat-block-rec-support-ctx: context [ | |
| make-rec-func: function [ | |
| "Make a function using a standard [db-object key] spec." | |
| name [word! block!] "Function name(s) (will be set)" | |
| doc-str [string!] "Function doc string" | |
| body [block!] "Function body" | |
| /with-rec "Include a rec arg in the spec, for add/put" | |
| ][ | |
| set name function compose [ ; set the name in the global context | |
| (doc-str) ; compose the doc string into the spec | |
| db [object!] | |
| key | |
| (either with-rec [[rec [block!] "Append as flat values."]][]) | |
| /case "Perform a case-sensitive search." | |
| /same {Use "same?" as comparator.} | |
| ] body ; use the body directly | |
| name ; return the name | |
| ] | |
| make-rec-funcs: func [ | |
| "Make a group of [series key] functions; returns func names set" | |
| spec [block!] "[name doc-string body] triples" | |
| /with-rec "Include a rec arg in the spec, for add/put" | |
| ][ | |
| collect [ | |
| foreach [name doc-str body] spec [ | |
| ;keep make-rec-func name doc-str body | |
| keep make-rec-func/:with-rec name doc-str body | |
| ] | |
| ] | |
| ] | |
| ; The idea here is that these all have the same spec and short bodies, | |
| ; so we can eliminate a lot of boilerplate. | |
| ; Read access funcs have a spec of [db key /case /same] | |
| specs: [ | |
| find-rec | |
| {Returns the series where the key is found, or NONE if the key doesn't exist.} | |
| [ | |
| either hash? db/keys [ | |
| pos: find/only/:case/:same db/keys :key | |
| ;if pos [at db/data db/rec-size * (subtract index? pos 1) + 1] | |
| if pos [at db/data rec-pos db index? pos] | |
| ][ | |
| find/only/skip/:case/:same db/data :key db/rec-size | |
| ] | |
| ] | |
| has-rec? | |
| {Returns true if key exists; false otherwise.} | |
| [not none? find-rec/:case/:same db :key] | |
| get-rec | |
| {Returns the record associated with a key, exlcuding the key itself, or NONE.} | |
| [if pos: find-rec/:case/:same db :key [copy/part next pos db/rec-size - 1]] | |
| remove-rec | |
| {Removes the record for the given key; returns true if a record was removed, false if the key didn't exist.} | |
| [ | |
| ; We need the key to find it, so remove the record first, then the hashed key. | |
| res: remove/part find-rec/:case/:same db :key db/rec-size | |
| if hash? db/keys [remove find db/keys :key] | |
| to logic! res | |
| ] | |
| take-rec | |
| {Removes and returns the record for the given key, including the key, or NONE if the key doesn't exist.} | |
| [ | |
| ; We need the key to find it, so remove the record first, then the hashed key. | |
| if pos: find-rec/:case/:same db :key [res: take/part pos db/rec-size] | |
| if hash? db/keys [remove find db/keys :key] | |
| res | |
| ] | |
| ] | |
| make-rec-funcs specs | |
| ; ;?? Is using the same method as other func-makers better here? | |
| ; ; The bodies are bigger, but it does ensure specs are consistent. | |
| ; Write access funcs have a spec of [db key rec /case /same] | |
| specs: [ | |
| put-rec | |
| {Add or update a record.} | |
| [ | |
| ; Account for key being part of the record | |
| if db/rec-size <> (add 1 length? rec) [ | |
| do make error! rejoin [ | |
| "Number of fields doesn't match record size. Expected " db/rec-size - 1 | |
| " values, but got " length? rec | |
| ". Rec: " mold rec | |
| ] | |
| ] | |
| either pos: find-rec/:case/:same db :key [ | |
| ; change/only is not used here, as the record values are not in a sub-block. | |
| head change next pos :rec | |
| ][ | |
| if hash? db/keys [append/only db/keys :key] | |
| append/only db/data :key | |
| append db/data :rec | |
| ] | |
| ; new-line is a performance killer in benchmarking | |
| ;new-line/skip db/data on db/rec-size ; just for visual sanity when testing | |
| db/data | |
| ] | |
| [add-rec insert-rec] ; you can set more than one name | |
| {Adds a new record; error if key already exists.} | |
| [ | |
| if find-rec/:case/:same db :key [ | |
| do make error! rejoin ["A record with that key already exists. Key: " key] | |
| ] | |
| put-rec/:case/:same db :key rec | |
| ] | |
| [change-rec update-rec] ; you can set more than one name | |
| {Update a record; error if key doesn't exist.} | |
| [ | |
| if not find-rec/:case/:same db :key [ | |
| do make error! rejoin ["A record with that key does not exist. Key: " key] | |
| ] | |
| put-rec/:case/:same db :key rec | |
| ] | |
| ] | |
| make-rec-funcs/with-rec specs | |
| ; ;?? Is using the same method as other func-makers better here? | |
| ; ; The bodies are bigger, but it does ensure specs are consistent. | |
| ; make-rec-func/with-rec | |
| ; 'put-rec | |
| ; {Add or update a record.} | |
| ; [ | |
| ; ; Account for key being part of the record | |
| ; if db/rec-size <> (1 + length? rec) [ | |
| ; do make error! "Number of fields doesn't match record size." | |
| ; ] | |
| ; either pos: find-rec/:case/:same db :key [ | |
| ; head change next pos :rec | |
| ; ][ | |
| ; append db/data :key | |
| ; append db/data :rec | |
| ; ] | |
| ; new-line/skip db/data on db/rec-size ; just for visual sanity when testing | |
| ; ] | |
| ; | |
| ; make-rec-func/with-rec | |
| ; [add-rec insert-rec] | |
| ; {Adds a new record; error if key already exists.} | |
| ; [ | |
| ; if find-rec/:case/:same db :key [ | |
| ; do make error! rejoin ["A record with that key already exists. key: " key] | |
| ; ] | |
| ; put-rec/:case/:same db :key rec | |
| ; ] | |
| ; | |
| ; make-rec-func/with-rec | |
| ; [change-rec update-rec] | |
| ; {Update a record; error if key doesn't exist.} | |
| ; [ | |
| ; if not find-rec/:case/:same db :key [ | |
| ; do make error! rejoin ["A record with that key does not exist. key: " key] | |
| ; ] | |
| ; put-rec/:case/:same db :key rec | |
| ; ] | |
| ; ; put-rec has an extra arg (rec), so has a different spec. | |
| ; set [put-rec change-rec] function [ | |
| ; {Add or update a record} | |
| ; db [object!] | |
| ; key | |
| ; rec [block!] "Append as flat values." | |
| ; /case "Perform a case-sensitive search." | |
| ; /same {Use "same?" as comparator.} | |
| ; ][ | |
| ; ; Account for key being part of the record | |
| ; if db/rec-size <> (1 + length? rec) [ | |
| ; do make error! "Number of fields doesn't match record size." | |
| ; ] | |
| ; either pos: find-rec/:case/:same db :key [ | |
| ; head change next pos :rec | |
| ; ][ | |
| ; append db/data :key | |
| ; append db/data :rec | |
| ; ] | |
| ; new-line/skip db/data on db/rec-size ; just for visual sanity when testing | |
| ; ] | |
| ; ; add-rec has an extra arg (rec), so has a different spec. | |
| ; set [add-rec insert-rec] function [ ; new append | |
| ; {Adds a new record.} | |
| ; db [object!] | |
| ; key | |
| ; rec [block!] "Append as flat values." | |
| ; /case "Perform a case-sensitive search." | |
| ; /same {Use "same?" as comparator.} | |
| ; ][ | |
| ; if find-rec/:case/:same db :key [ | |
| ; do make error! rejoin ["A record with that key already exists. key: " key] | |
| ; ] | |
| ; put-rec/:case/:same db :key rec | |
| ; ] | |
| ; at-rec doesn't take a key, but a record number | |
| set 'at-rec function [ | |
| db [object!] | |
| rec-num [integer!] | |
| ][ | |
| ;at db/data (rec-num - 1) * db/rec-size + 1 | |
| at db/data rec-pos db rec-num | |
| ] | |
| set 'rec-pos function [ | |
| db [object!] | |
| rec-num [integer!] | |
| ][ | |
| (rec-num - 1) * db/rec-size + 1 | |
| ] | |
| ] | |
| ;------------------------------------------------------------------------------- | |
| flat-recs-db: context [ | |
| rec-size: none | |
| keys: none ; hash! containing only keys | |
| data: none ; flat block of record data | |
| ] | |
| make-flat-recs-storage: func [ | |
| spec [map!] "rec-size: Record size (number of fields/values); alloc: number of records to allocate initially." | |
| ][ | |
| ; TBD ensure rec-size and alloc are > 0 | |
| make flat-recs-db [ | |
| rec-size: spec/rec-size | |
| keys: make hash! any [spec/alloc 100] | |
| data: make block! spec/rec-size * any [spec/alloc 100] | |
| ] | |
| ] | |
| ;------------------------------------------------------------------------------- | |
| expect-err: func [blk [block!] /local err][ | |
| if error? set/any 'err try blk [ | |
| print ["EXPECTED ERR:" mold err/arg1] | |
| ] | |
| ] | |
| ;------------------------------------------------------------------------------- | |
| ;!! Record key counts as part of the record size | |
| db: make-flat-recs-storage #[rec-size: 4] | |
| print "" | |
| put-rec db 'rec-1 ['Alice #senior 1] | |
| put-rec db 'rec-2 ['Bob #median 2] | |
| put-rec db 'rec-3 ['Carol #junior 3] | |
| print ["DB:" mold db] | |
| print ['Has 'rec-1 tab has-rec? db 'rec-1] | |
| print ['Has 'rec-2 tab has-rec? db 'rec-2] | |
| print ['Has 'rec-3 tab has-rec? db 'rec-3] | |
| print ['Has 'rec-4 tab has-rec? db 'rec-4] | |
| print "" | |
| print ['Get 'rec-1 tab mold get-rec db 'rec-1] | |
| print ['Get 'rec-2 tab mold get-rec db 'rec-2] | |
| print ['Get 'rec-3 tab mold get-rec db 'rec-3] | |
| print ['Get 'rec-4 tab mold get-rec db 'rec-4] | |
| print "" | |
| print ['At 2 tab mold at-rec db 2] | |
| print "" | |
| print ['Take 'rec-2 tab mold take-rec db 'rec-2] | |
| print mold db/keys | |
| print ['Take 'rec-2 tab mold take-rec db 'rec-2] | |
| print ["DB:" mold db] | |
| print ['Add 'rec-2 mold add-rec db 'rec-2 ['Bob #median 2]] | |
| expect-err [add-rec db 'rec-2 ['Bob #median 2]] ; dup key error | |
| expect-err ['Change 'rec-4 mold change-rec db 'rec-4 ['Dan #doh! 4]] ; key doesn't exist error | |
| print ['Remove 'rec-2 tab mold remove-rec db 'rec-2] | |
| expect-err [add-rec db 'rec-2 ['Bob #median 2 xxx]] ; rec length wrong error | |
| print ['Remove 'rec-2 tab mold remove-rec db 'rec-2] | |
| print ['Add mold [rec x] mold add-rec db [rec x] ['Prof-X #super 2E10]] | |
| print ["DB:" mold db] | |
| print ['Get mold [rec x] tab mold get-rec db [rec x]] | |
| ;halt | |
| ; This is for Gregg's testing | |
| do %/d/red/mezz/profile.red | |
| n: 10'000 | |
| print ["N:" n] | |
| db: make-flat-recs-storage #[rec-size: 4] | |
| add-perf: [repeat i n [insert-rec db append copy "rec-" i compose ['Alice #senior (i)]]] | |
| put-perf: [loop n [put-rec db append copy "put-rec-" n ['Alice #senior 1]]] | |
| has-perf: [repeat i n [has-rec? db append copy "rec-" i]] | |
| get-perf: [repeat i n [get-rec db append copy "rec-" i]] | |
| find-perf: [repeat i n [find-rec db append copy "rec-" i]] | |
| find-perf-x: [repeat i n [find-rec db "rec-1"]] | |
| find-perf-xx: [repeat i n [find-rec db "rec-1000"]] | |
| find-perf-xxx: [repeat i n [find-rec db "rec-10000"]] | |
| load-mold-perf: [load mold db/data] | |
| profile/show [ | |
| add-perf put-perf has-perf get-perf find-perf | |
| find-perf-x find-perf-xx find-perf-xxx | |
| load-mold-perf | |
| ] | |
| halt | |
| ; With a hash of keys in place. | |
| N: 10'000 | |
| Time | Time (Per) | Memory | Code | |
| 0:00:00.015 | 0:00:00.015 | 1708008 | load-mold-perf | |
| 0:00:00.026 | 0:00:00.026 | 284 | find-perf-x | |
| 0:00:00.027 | 0:00:00.027 | 284 | find-perf-xx | |
| 0:00:00.028 | 0:00:00.028 | 284 | find-perf-xxx | |
| 0:00:00.037 | 0:00:00.037 | 880284 | find-perf | |
| 0:00:00.042 | 0:00:00.042 | 880284 | has-perf | |
| 0:00:00.05 | 0:00:00.05 | 1800284 | get-perf | |
| 0:00:00.06 | 0:00:00.06 | 880440 | put-perf | |
| 0:00:00.102 | 0:00:00.102 | 1898080 | add-perf | |
| N: 25'000 | |
| Time | Time (Per) | Memory | Code | |
| 0:00:00.049 | 0:00:00.049 | 3954936 | load-mold-perf | |
| 0:00:00.065 | 0:00:00.065 | 284 | find-perf-x | |
| 0:00:00.068 | 0:00:00.068 | 284 | find-perf-xx | |
| 0:00:00.072 | 0:00:00.072 | 284 | find-perf-xxx | |
| 0:00:00.092 | 0:00:00.092 | 1172444 | find-perf | |
| 0:00:00.132 | 0:00:00.132 | 250620 | get-perf | |
| 0:00:00.132 | 0:00:00.132 | 1171828 | has-perf | |
| 0:00:00.135 | 0:00:00.135 | 1171764 | put-perf | |
| 0:00:00.229 | 0:00:00.229 | 4548640 | add-perf | |
| N: 100'000 | |
| Time | Time (Per) | Memory | Code | |
| 0:00:00.218 | 0:00:00.218 | 15817704 | load-mold-perf | |
| 0:00:00.263 | 0:00:00.263 | 284 | find-perf-x | |
| 0:00:00.269 | 0:00:00.269 | 284 | find-perf-xx | |
| 0:00:00.269 | 0:00:00.269 | 284 | find-perf-xxx | |
| 0:00:00.375 | 0:00:00.375 | 4516004 | find-perf | |
| 0:00:00.44 | 0:00:00.44 | 4515388 | has-perf | |
| 0:00:00.528 | 0:00:00.528 | 986460 | get-perf | |
| 0:00:00.581 | 0:00:00.581 | 2536820 | put-perf | |
| 0:00:00.922 | 0:00:00.922 | 15978140 | add-perf |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment