Please sign in to access this page
You know HTML entities - like how > is the greater than sign, and how © is ©? Most JS libraries to encode/decode them are very bloated. I'm making a more lightweight one.
No followers yet
Once you ship this you can't edit the description of the project, but you'll be able to add more devlogs and re-ship it as you add new features!
and now that the core work is done, i worked a little on the documentation: updated the package.json to include some relevant fields and added example usage to the readme. the example for the tryReads uses a TransformStream which i think is neat.
when you make something in a day, the last parts usually end up a bit unfinished. this happened with tinyentities, where stream decoding took 2.5x the time of entities. but note the past tense: i fixed that by not using regex (as much) and now it's neck and neck!
i've added something that would be useful if you need to stream in entities - it basically lets you keep trying to read something that looks like an entity until you figure out whether it is (and you can emit it as its encoding) or it isn't (and you can emit it as text)
i updated the benchmarks to separate init and runtime and from there i just got optimizing. i was able to speed up data unpacking, switch how the map used for encoding purposes is stored for speed, convert xml encoding into one call for speed, switch to regexes more optimized for my purposes (more importantly faster), and fix multiple bugs multiple times... streaming parser next i guess
i set up some benchmarks (well it was a group effort with ai). now we can group functions into:
so room for growth
and now decoding. decoding can be more complicated once i add streaming, but i've gotten a REALLY simple implementation that works going. (also optimized/restructured map.ts a little to add support for decoding and better tree shaking)
i set up the encoding functions. details:
- i read through code in entities and dom-serializer to figure out the services i need to provide at the end of the day
- i implemented the lighter escapeHTML, escapeHTMLAttribute, escapeXML, and escapeXMLAttribute, which escape just enough to not have problems
- then i implemented the more complex encodeHTML and encodeXML, which encode almost everything, with the former even encoding punctuation and multi character entities when possible.
- i also signalled to bundlers that the mapping can be dropped if unused by wrapping the process of loading it in a pure IIFE (the (() => { code })()
things)
i offset the bundle size increase a little: based on any time an entity exists without a semicolon, it also exists with a semicolon, i can only include the semicolonless version and it's implied that the semicolon version also exists. 7742 bytes -> 7520 bytes
i'm now actually generating the map. the map is a bit bigger now so it can include special cases like & (note how there's no ;, so my script packs that as !).
i got the initial version of the mapper going. the idea is to convert the 145.8kb entities.json into a highly optimized map by making some optimizations (for example, we assume there's always an increment of one between codepoints) and restructuring (like separating each first character level with a newline and each second character level with the >
character). it's only 7.1kB gzipped (half the size of a naive mapper) and should work for both encoding and decoding.
i set up the project. i'm not used to making libraries - i usually make web apps - but i'm using tsdown for compilation/transpilation, which i hope will make things easy.