(the assertions indicate it's a scheme interpreter)
Trying to reproduce/get to the bottom of this error: NixOS/nix#4119
this is a restriction I ran into while trying to create a sanbox scheme file that trips the "pattern serialization length" error
there seems to be a limit to the number of atoms you can have for subexpressions, at the top-level and this limit seems to be 38834
having errors when you're right up against the limit (38835 to 38840, inclusive) hits an assertion: Assertion failed: (sc->gc_protect_lst == initial_gc_protect_lst), function opexe_0, file scheme.c, line 2933.
past this point you just get profile compilation failed
whitespace, indentation, comments seem irrelevant
there does not seem to be a limit on top-level atoms, only top-level subexpression atoms
the length of each individual atom seems irrelevant
- though, as an aside, strings seem to be limited to 1023 bytes (unicode is accepted); past this you get:
sbpl1:35:4: Error reading string (list 'deprecated (lambda args (if (= 0 (length args)) (disable-full-symbolication) (error "unexpected argument"))))
the kind of atom seems irrelevant, but different kinds of atoms seem to count towards this limit differently:
- bare words (i.e.
test) and parens (i.e.()) count as 1 - numbers and strings count as two (perhaps because these desugar to pairs or
()s?)- i.e.
(0 0 0 ...)will error after 19417 zeros and("a" "a" "a" ...)will error after 19417"a"s- (since (
19417 * 2 + 1) is when you first exceed38834
- (since (
- i.e.
() counts as one towards the limit, (()) counts as two (as does (test))
("a") counts as three, ("a" "a") counts as five
- this is why the above has an extra
+ 1; the top-level(...)counts as 1
NOTE this restriction only seems to exist if you use (version 1); (version 2) and (version 3) happily accept more atoms in subexpressions
- not sure what macOS version started accepting version 2/3 though
using the repo linked here to reproduce this error
nix build .#devShell.aarch64-darwin --option sandbox true -vvvv
happens regardless of version (1, 2, and 3)
the length that the error refers to seems to be the length of some processed artifact:
- for example, there's definitely deduplicating going on; copying the same
subpathmultiple times does not change the length in the error message - as others have noted, this processing/serialization step seems to operate on the full list; splitting up the paths into multiple top-level
allows or duplicating paths in separate top-levelallows has no effect on the length - the post processing is more sophisticated than just deduping; it also seems to eliminate implied subpaths:
- i.e.
(subpath "/tmp")and(subpath "/tmp") (subpath "/tmp/blue")have the same length - same with literals:
(subpath "/tmp")and(subpath "/tmp") (literal "/tmp")and(literal "/tmp") (subpath "/tmp")and(subpath "/tmp") (literal "/tmp/foo")all have the same length
- i.e.
- looking at how the length reacts to paths like
(literal "/a")and(literal "/u")being added in the presence of other paths, it definitely seems like there's some kind of regex minimization going on here - it also seems to eliminate
literals andsubpaths againstregexs that imply them; not sure how sophisticated this analysis is but I have not been able to flummox it yet - naive "optimizations" like turning
(subpath "/nix/store/foo")and"(subpath "/nix/store/bar")into(regex #"^/nix/store/(foo|bar))yield no change in length - I think this is an indication that
sandbox-simplifydisappeared because it was subsumed intosandbox-exec...
Also, some regexes do indeed produce this error
nesting sandbox-exec calls?
- even if this is permitted, unless we switch to allowing all of
/nix/storeanddenying files (i.e. inverting the filter) in these cases, I don't think this is possible- and, I don't think ^ is a good idea
reverse-engineering the bytecode format that the kernel accepts for these filter programs
- it seems like what's going on is that
sandbox-exectakes this scheme program that it runs to get a filter program that contains regular expressions and such - it then compiles down this file of regexes, etc. into bytecode that the kernel executes
- the "serialization length" limit seems to be a restriction on the length of this bytecode
- it's not clear but I assume this is a limit imposed by the kernel and not just
sandbox-exec
- it's not clear but I assume this is a limit imposed by the kernel and not just
- if we think we can do better in terms of cramming things into binary size (or adding escape hatches to permit a kind of nested
sandboxthing as described above) then we could do the bytecode generation ourselves
actually, this paper describes some findings on the mechanics of the sandbox bytecode: https://www.ise.io/wp-content/uploads/2017/07/apple-sandbox.pdf