Tries as hash-maps are common, but hash-maps take up a lot of memory (relatively speaking).
For example, creating a hash-map trie of 1, 2, and 3-grams of short story by Edgar Allen Poe results in a hash-map that consumes over 2 megabytes of memory. [[file:examples/markov_language_model.clj][See this markov language model example]].
If you're dealing with much larger corpuses, the memory footprint could become an issue.
A tightly packed trie, on the other hand, is tiny. A tightly packed trie on the same corpus is only 37 kilobytes. That's ~4% of the original trie's size, even after the original trie's keys/values have all been condensed to numbers!
The REPL representation of a Trie only shows children key/values. The "root" node (not necessarily the "true" root node if you've travsersed down with `lookup`) doesn't print any data to REPL.
So if you're looking ata node with no children, you'll see `{}` in the REPL. But you can get the value of that node with `(get node [])`
It's not very efficient. All of the strings, nested maps, pointers... it all adds up to a lot of wasted memory.
A tightly packed trie provides the same functionality at an impressively small fraction of the memory footprint.
One restriction though: all keys and values must be integers. To convert them from integer identifiers back into the values that your biological self can process, you'll need to keep some type of database or in-memory map of ids to human-parseable things.
Here's a similar example to that above, but with values that we can tightly pack.
Ulrich Germann, Eric Joanis, and Samuel Larkin of the National Research Institute of Canada for the paper [[https://www.aclweb.org/anthology/W09-1505.pdf][Tightly Packed Tries: How to Fit Large Models into Memory,and Make them Load Fast, Too]].
TODO: The below is closer to a CSCI lesson than library documentation. If it's necessary, figure out where to put it, how to word it, etc... It might not be worth cluttering documentation with so much detail.
** Autocomplete
A user types in the characters "D" "O" and you want to show all possible autocompletions.
*** Typical "List" data structure
- Iterate through each word starting from the beginning.
- When you get to the first word that starts with the letters "D" "O", start keeping track
of words
- When you get to the next word that doesn't start with "D" "O", you have all the words you want to use for autocomplete.