True Story Follows
In my own personal goal to become a better programmer, I’m frequently searching for the newest and hottest techmology to be privy to, or perhaps find some earth shatteringly erudite YouTube video that would allow me to passively become a stronger developer, but I’ve come to find more and more that this largely does not work, at least for the purposes of becoming a better craftsman. As what should have been clear, the most fruitful sources of knowledge would come in the form of information that would take effort to digest. One such piece was to revisit one of my most hardcore college professor’s work, Purely Functional Data Structures by Dr. Chris Okasaki.
To draw a parallel between mental prowess and weight lifting, Dr. Okasaki could easily bench press 635 lbs if the weight room were a metaphor for programming. He would literally rep out 405 metaphorical lbs routinely in class just to demonstrate a mind blowing concept.
My initial goal was to take code written in Standard ML to generate data structures and translate it into Python so that the code might become easier to understand. Typically these data structures achieve some operation in constant or logarithmic time, but in Python the program becomes overwhelmingly slow because we find that the run time efficiency is proportional to some large constant. Even so, I have some ideas to put these data structures to use in a production environment to bring about a substantial benefit, but that proof will have to come in a follow-on blog post. For now, perhaps some thought provoking concepts can be entertained.
The General Idea Behind Purely Functional Data Structures
In order to justify even using a purely functional data in the first place in an imperative language (such as Python), especially in a case where those common data structures are built in data types, we need to capitalize on the benefits a purely functional structure introduces. Most notably, every purely functional data structure is completely immutable, so any sort of update will return a new object entirely, and the original data structure will still exist until its reference count drops to 0. This means that caching becomes trivially easy, and meta data about a structure, such as its length, can also easily calculated no more than 1 time.
The “Hello World” of Purely Functional Data Structures
If we have a linked list of nodes, and I want to amend a node to the head of list, I would simply return a new node whose next value is the current head. If I wanted to insert a node into the middle of the list, I would return a copy of the 1st node that instead points to a copy of the 2nd node that instead points to a copy of the 3rd node and so on until we reached the middle of the list where the update occurred, and at that point we point to a new node that points to the next element in the original list. This doesn’t seem practical at all until we start representing nodes such operations occur against the datastructure occur in logarithmic time or better.
Red Black Trees
The first data structure I tried out was a red black tree, which allows insertion, deletion, and retrieval in logarithmic time. I actually wrote a separate blog post about that topic you want to see the code.
Random Access Lists
My explanations won’t do justice, so be sure to actually read Purely Functional Data Structures if you want more than the glossed over details. To explain my own understanding in depth is really just repeating Dr. Okasaki’s text but in a less accurate and less informative manner. But if I were in a situation kind of like that one in Swordfish where Wolverine has a gun to his head, and I was asked to explain purely functional random access lists, the general idea is that you have a two-dimensional data structure where you can traverse to the right in logarithmic time and traverse downward in logarithmic time, thereby giving you an overall logarithmic run time on operations.
This can be done by creating a list of full binary search trees, where only the first two trees are allowed to be of equal rank (i.e. two trees that each have exactly 1 node or exactly 3 nodes or exactly 7 nodes). In this case, by adding an additional node to those two trees, they can be joined into another single binary search tree. If each insertion corresponds to an index in a list, then the tree is created in a deterministic format independent of the values at each node. To retrieve a node that corresponds to an index, scan the roots of the trees (which is a linear search of those nodes, but logarithmic relative to the size of the entire list), until you’ve found the correct tree. Now binary search the tree for the index, and this is also logarithmic time.
Inserting an item is constant time because this only requires either adding a single node to create a tree of size 1 or adding a single node that becomes the new root of two trees of equal rank. Searching for an item is logarithmic time, as the above explanation tries to outline, and updating an item is also logarithmic time because after we create a new node to replace an existing node in the tree, we logarithmically back trace up to the corresponding root and back to the end of the top level linked list, creating new node copies as we go.
Skew Binomial Queues
The white paper I read from to understand these concepts can be found here: Optimal Purely Functional Priority Queues by Chris “The Running Man” Okasaki and Gerth Brodal.
Now we get to our first data structure that I would predict to be practical in my own projects where I’m not actually using a functional language. At this point this is all a theory, so I won’t go into an explanation as to why too deeply, but consider how often the average developer / hacker uses a priority queue: almost never. The built in data types are fast enough not to need these when working in process, and if a large enough dataset is being used to warrant a database, we’re probably just going to use the database to query for results.
With that in mind, there are many cases where a particular redundant database query is in fact querying “the next batch of items I need to work with”. This starts to sound like – get this – a priority queue, which if used in a SQL query’s place would surface the benefits of purely functional data structures (immutability, caching, parallelism, etc).
A skew binomial queue allows for enqueueing in constant time, peeking in constant time, merging in constant time, and dequeueing in logarithmic time.
It took me a few days to actually figure the whole thing out, and in the process it took me back to the days of Dr. Okasaki’s homework problems where the breakthroughs happened through recursive leaps of faith and the solutions seemed trivially easy after the fact.
Now the idea, also far less informative than Dr. Okasaki’s writing because I’m distilling it down into a single blog post, is that we start with a Binomial Heap where a tree with rank N has nodes that are allowed up to N children. This means that when two trees of equal rank are combined, that can be done in a single operation because one tree can now just link to the second tree from the root node, and this idea can be applied recursively. With each combination, the highest priority tree stays at the top.
Now, skew binomial heaps don’t need to be complete binomial heaps. Dr. Okasaki makes the distinction between simple links, type A skew links, and type B skew links, but all links can be generically described as creating a new skew binomial heap from an input list of up to 3 skew binomial heaps. Given the input heaps, the highest priority heap will be the root. Sort the remaining heaps by rank, lowest to highest, and those heaps will be added to the left side of the new root heap’s list of children. Of the child heaps that we just sorted, the rank of the new heap will be the right most child’s rank + 1.
Now apply the ideas from random access lists to create a skew binomial queue. If we have a list of those skew binomial heaps, we can create a rule where the first two heaps in the list are allowed to be equal rank, but the others are not and are sorted in ascending order by rank (like a binary number where the next index corresponds to a higher power of 2). Then, adding an item to the first heap in the list might cause the first two heaps to be merged. Since only the first two heaps are allowed to be equal rank, this operation is guaranteed to not cascade. Note that the rank of the tree, not the values present in the tree, determine the order in the list.
To meld two skew binomial queues, it’s similar in nature to adding 2 binary numbers, where you might end up having to carry a “1” log N number of times. Each index in the base 2 numbering system corresponds to a heap’s rank, and we still allow for the first two heaps to be of equal rank.
To dequeue from the skew binomial queue, scan the roots for the highest priority value. Pop the root node from this heap to return that value to the caller. Now we need to reconstruct the data structure to put it back into a valid state. The heap that you popped from is removed from the list of heaps. The children of the node you popped can be casted as a skew binomial queue since the list of children is in itself a valid skew binomial queue. With the casted skew binomial queue, merge that with the original queue that you just popped from.
At this point, you can probably stop and you have a reasonable data structure. Insertion is O(1), melding is O(log N), peeking is O(log N), and dequeueing is O(log N). To make peeking take O(1), you could simply keep track of the root nodes in the skew binomial queue as you manipulate them and internally keep the highest priority value between thw two. This is taking advantage of the data structure’s immutability, but as I found out while trying to follow Dr. Okasaki’s instructions, this was the wrong way to go about making that operation take constant time.
The reason this is not “correct” is because the O(1) peek concept is a building block to making meld take O(1) time as well. Instead, you should create a new data structure entirely, that when thought of as a whole, creates another heap, but you actually don’t need to even think about the structure entirely if you just take a recursive leap of faith.
Simply put, create a single node that acts as an entry point to a priority queue. This node has a single value and a priority queue (two distinctly different data types!). When you enqueue a value, if this value is higher priority than the current highest priority value, store that value in the node, and push the current highest priority value into the priority queue. If it is a lower priority, push the value into the queue and keep the currently highest priority value as is. In other words, keep the highest priority value at bay from ever entering the queue! Follow this train of thought for dequeue and meld.
Finally, we come to bootstrapped skew binomial queues, which allows us to turn meld into an O(1) operation. Again, read Dr. Okasaki’s work for the best explanation, but I ended up having to read both Dr. Okasaki’s book and his white paper to finally “get it”. When we create a bootstrapped binomial queue, we don’t modify the existing skew binomial queue at all. Like the above concept of a global root, we create a layer on top of what we already have and trust that it works. This will also require a recursive leap of faith.
Create a bootstrapped skew binomial node that contains the highest priority value and a primitive priority queue that contains object types of itself (the bootstrapped skew binomial node). Boom, I’m done, there it is. I can’t fully articulate the datastructure, but it works. Melding two of these types takes O(1) time.