Ascii85, also known as base85, is an encoding similar in concept to base64. Where base64 uses four ASCII characters to represent three source bytes (thereby inflating the data size by 33%), ascii85 uses five ASCII characters to represent four source bytes (thereby inflating the data size by 25%).
This script can be used to encode and decode a DataView as ascii85. Spawn an Ascii85Codec with the desired configuration, and then call its member functions as appropriate. In particular, this class should be appropriate for storing a DataView inside of Local Storage with the smallest possible size (if you use the STORAGE_CHARSET offered).
Note that we do not perform any error-checking during the decode step. Ascii85Codec offers a validate member function that can be run on an encoded string to verify that it contains no illegal characters; this can be run prior to decoding in any situation where the input data is untrusted.
-
The
STORAGE_CHARSETis offered as a convenience, to aid with JavaScript that needs to store binary data via the Local Storage API with minimal overhead. Browsers store all such data as a JSON string, so the string representation of the data is what counts toward storage size limits. Chromium uses a null-terminated JSON string (and counts the terminating null), and within stored string values, Chromium escapes double-quotes and and left angle brackets.STORAGE_CHARSETsubstitutes",<and\out in order to avoid the additional overhead of string escape sequences in the serialized JSON (as these would be represented in storage as\",\u003C, and\\). -
In the encoding step, we pre-create
charsas an array of five string values and then replace individual elements. This ensures that the array is packed and initially allocated with the desired size. -
Validation of an encoded string works by pre-compiling a regular expression to test the input. We here assume that native code will do the job faster than we would. To keep as much complexity out of the regex as possible, empty strings are treated as a special case and not handled by the regex.
-
Decoding always allocates an
ArrayBufferwith a length that is a multiple of 8, and we just return a truncatedDataViewinto that buffer. This allows us to write the final chunk with a singlesetUint32call, instead of having to rebuild a padded DWORD and then decompose it into bytes by hand. I am here (micro)optimizing for speed over space, wasting no more than three bytes of memory per operation. -
Decoding uses
String.prototype.indexOfto count the number of occurrences of the two abbreviated token types (zfor0x00000000andyfor0x20202020). This requires us to crawl the input string twice, but should still be faster than manually looping over the characters just once, as we can take advantage of optimized native code (which I presume will use things like SIMD, possibly within standard library functions likememchr, to very rapidly scan the string for a single character).