Node-based data structures hold some advantages over array-based data structures, however basic node-based structures such as linked lists have one major drawback that arrays possess when both are in a sorted order; the ability to be searched in a worst case that is less than O(n) using binary search. This drawback of searching a linked list is where a Binary Search Tree can be a very powerful data structure.
Assumptions:
- Proficiency in <a href="https://www.devmaking.com/learn/data-structures/linked-list/" target="_blank" style="color:inherit;">Linked Lists</a>
- Comfortable with recursion
- Understanding of <a href="https://www.devmaking.com/learn/algorithms/big-o-notation/" target="_blank" style="color:inherit;">Big-O Notation</a>
- Knowledge of <a href="https://www.devmaking.com/learn/algorithms/binary-search/" target="_blank" style="color:inherit;">Binary Search</a> is a plus!
What is a Binary Search Tree (BST)?
A binary search tree is a node based tree-structure where each node has a left
and a right
pointer corresponding to the left and right "sub-trees". All values that exist in a given node's left subtree are comparably smaller than the node's value, and all values that exist in the right subtree are comparably larger than the node's value.
> Values can be any type of data so long as there is a deterministic way that they can be compared.
<div style="width:100%; margin:auto;text-align:center;"> <img src="https://www.devmaking.com/img/topics/datastructures/BST_01.png" alt="binary search tree pretty image" style="width:500px;max-width:95%;"> </div>
The Binary Tree Node
Each node in a BST is comprised of three elements: two pointer's to the left and right sub-trees, and a value that can be of any type. For simplicity, we'll be using integers. The node class might look something like this:
/* pseudo-code */
class Node {
// The comparable value.
int value;
// Reference to the left subtree.
Node left;
// Reference to the right subtree.
Node right;
}
In the instances that a node does not have either a right or a left child subtree, it is known as a leaf node.
Searching a BST
Given that "search" is in the data structures name, it makes sense that this type of tree specializes in finding elements within the structure. As mentioned above, the tree populated in such a way that every value in the right subtree is larger than the current node's value, and every node in the left subtree is smaller than the current node's value.
For example: imagine a BST with a root node value of 10. If 12 was inserted, the new node would go to the right subtree of 10. If 11 is then inserted, it goes to the right subtree of 10, and then compares itself to 12 to find that it goes in the left subtree of 12. This process continues as more and more elements are added into the tree. Finding a node in this tree requires the same steps, by checking the current node's value and moving to the right or left subtree.
Find
When we want to find a specific element in a BST and return the encapsulating node, we start at the root
or the tree and use the rules to search the left or the right subtree accordingly. This process is modeled in pseudo code using an iterative version:
/* pseudo-code */
// The top node of the tree.
Node root;
function find(int value){
Node currentNode = root;
// Iterate the tree
while ( currentNode is not null ){
// If the values are equal, we are done!
if( currentNode.value == value ){
return currentNode;
}
// (1).
else if ( currentNode.value > value ){
currentNode = currentNode.left;
}
// (2).
else {
currentNode = currentNode.right;
}
}
// (3).
return null;
}
Analyzing the code: 1. If the current node's value is greater than the value we are trying to find, then we know that the value can only exist in the left subtree, so we set our current node to the left item. 2. In this case, the opposite is true: we know that the current node's value is smaller than the value being searched for, so we explore the right subtree. 3. Lastly, if the iteration ends then we have reached an empty leaf node, and can confirm that the value being searched for does not exist.
> From this point onward, the remaining methods will be implemented using recursion where applicable!
Bonus: a recursive version of the find
method:
/* pseudo-code */
Node root;
function find( int value, Node currentNode ){
if(currentNode is null) return null;
if( currentNode.value == value ){
return currentNode;
} else if ( currentNode.value > value ){
return find(value, currentNode.left);
} else {
return find(value, currentNode.right);
}
}
Contains
Contains is a method that is very similar to find, except that instead of returning a node from the tree, a Boolean is returned.
/* pseudo-code */
Node root;
function contains(int value, Node currentNode){
// The node is false, so it does not contain the value.
if(currentNode is null) return false;
// The values match, so the tree contains it!
if( currentNode.value == value ){
return true;
} else if ( currentNode.value > value ){
// Iterate to the left subtree.
return contains(value, currentNode.left);
} else {
// Iterate to the right subtree.
return contains(value, currentNode.right);
}
}
Notice again that the method is nearly identical to the recursive version of the find
method. If you were feeling like a lazy programmer, you might even be able to get away with this:
/* pseudo-code */
Node root;
function contains(int value){
if (find(value, root) == null) return false;
else return true;
}
> While it's possible to get away with this, it is not recommended!
Inserting into a BST
If you have been able to follow up to this point, then you're in luck; inserting into a BST follows the same logic pattern as find
and contains
with only one additional modification!
Starting from the root
of the tree, the value that is being inserted is compared, moving to the left or right sub-tree's along the way. The additional step comes in when making the decision to jump into the subtree or not; if a subtree on the side that is being traversed to is null
, then instead of traversing to that node, a new node is created and placed as the child subtree on the corresponding side of the node.
/* pseudo-code */
Node root;
function add(value, Node currentNode){
if( currentNode.value > value ){
// (1)
if( currentNode.left != null ) add(value, currentNode.left);
else currentNode.left = new Node(value);
} else if( currentNode.value < value ) {
// (2)
if( currentNode.right != null) add(value, currentNode.right);
else currentNode.right = new Node(value);
}
// (3)
return;
}
Analysis:
If the current node's value is larger than the value, that means we are going to be exploring the left subtree. However, before jumping into that subtree, we check if the child node is null. If this is not the case, we explore that subtree. If it is the case, though, a new node is created as the left subtree of the current node.
In this case it is the same story, but in reference to the right subtree. If the right subtree is null, a new node is created and placed there.
Lastly, this implementation varies slightly from the
find
andcontains
methods by choosing to do nothing if the value is already existing in the BST. Depending on your implementation this may change though.
Deletion
When removing an element from a BST, getting to the value in question is the same as we've been handling so far, however once the value has been found, properly removing the node from the BST can get a little tricky, however it can still be done!
Step 1: finding the node.
To find the node to be removed follows a similar process as with the other methods. Starting at the root node, traverse down the sub-trees, comparing values to decide on the left or right subtree. If the node becomes null, we can back out of the operation as there is no node to be deleted. In the more likely case that the node to be removed is found, we move onto the next step. The catch with this operation, as it will be seen in the pseudo code, is that instead of simply making the recursive call to the next subtree, we are going to be actually reassigning the sub-tree's as they go. This looks like so:
currentNode.subtree = remove(value, currentNode.subtree);
This recursive operation then requires that the remove method returns a node at the end of each recursive iteration.
Step 2: deleting the node.
Depending on where the node is on the tree, the process involved in removing it varies. In this case, removing comes down to 4 cases:
- The current node has a left and a right subtree
- The current node has a left subtree
- The current node has a right subtree
- The current node has no sub-trees
Case 1: both sub-trees exist
This case is the most complex, so the good news is that it only gets easier from here. If the node to be deleted has non-null right and left sub-trees, then this case applies. This step takes some understanding of how binary trees operate to fully grasp. If there are nodes on both sides, then the node needs to be replaced with another node to keep the rules of the BST in tact. The best way to do this is by replacing the node to be deleted with the smallest node in the right subtree. The logic of this allows the rules of our BST to remain in tact while making as few changes to the overall tree as possible.
To explain this, remember the principles: if all the nodes in the right subtree are larger than the current node, and all the nodes in the left subtree are smaller than the current node, then the smallest node in the right subtree is still greater than the largest node in the left subtree.
<div style="width:100%; margin:auto;text-align:center;"> <img src="https://www.devmaking.com/img/topics/datastructures/BST_02.png" alt="deleting from a binary search tree" style="max-width:95%;"> </div>
If we swap this smallest, right-sided value with the value to be removed, we only need to remove the smallest value from the right subtree, which falls under case 3 or 4, and are very straightforward to implement.
Now that we understand the process, case 1 is summarized as:
- Get the smallest node from the right subtree.
- Replace the value from that node with the current node.
- Remove the smallest node from the right subtree.
Case 2: the current node has only a right subtree
If you've made it this far, you can let out a sigh of relief that the worst is now behind you! If the current node has only a right subtree, then that node can be removed and replaced with the right child. Since the operation to iterate to the next node happens like so
currentNode.right= remove(value, currentNode.right);
Then by setting the currentNode
to the right subtree in the next iteration, it is nearly equivalent to the following, which removes that intermediary node:
currentNode.right = currentNode.right.right;
> The node that currentNode.right points to in this case would be the node to be removed.
Case 3: the current node has only a left subtree
This case is the same as case 2, but mirrored over to the left subtree. Once again, it is nearly equivalent to removing the intermediary node in the iteration before:
currentNode.left = currentNode.left.left
Where currentNode.left is the node that is being removed.
Case 4: the current node has no sub-trees
If the current node has no sub-trees, then we only need to set the currentNode to null. When the recursive function is over, the subtree of the parent node will also be set to null, as its call was:
currentNode.subtree = remove(value, currentNode.subtree)
Putting it together
Now that all the cases have been discussed, the pseudo code implementation can come together!
/* pseudo-code */
Node root;
/* Gets the remove method started. */
function remove(int value) {
if(root == null) return;
else root = removeHelper(value, root);
}
/* Helper Method of the remove function. */
function removeHelper(int value, Node currentNode){
/* Part 1 */
if( currentNode is null ) return null;
// Traverse to the right subtree.
if( currentNode.value < value ) {
currentNode.right = removeHelper(value, currentNode.right);
}
// Traverse to the left subtree.
else if( currentNode.value > value ){
currentNode.left = removeHelper(value, currentNode.left);
}
// Matched the value, now delete!
/* Part 2 */
else {
/* Case 1 */
if( currentNode has both sub-trees ){
// See below for function.
Node leftMost = getLeftMost(currentNode.right);
// Swap the values.
currentNode.value = leftMost.value;
// Remove the smallest node from the right subtree.
currentNode.right = removeHelper(leftMost.value, currentNode.right);
}
/* Case 2 */
else if( currentNode has a left subtree ){
currentNode = currentNode.left;
}
/* Case 3 */
else if( currentNode has a right subtree ){
currentNode = currentNode.right;
}
/* Case 4 */
else currentNode = null;
}
/* Important! */
return currentNode;
}
// Helper function, gets smallest node in a subtree.
function getLeftMost(Node currentNode){
if(currentNode == null) return null;
else if (currentNode.left == null) return currentNode;
else return getLeftMost(currentNode.left);
}
Performance and Worst Cases
While on average a BST will perform well with roughly O(log(n)) time to find an element, there are worst case scenarios to look out for that can bring the worst case up to O(n) timing. The primary worst case scenario occurs when items are inserted into the BST in sorted order. Doing so would result in all of the sub-tree's being right-sub-trees, and the depth of the tree being also O(n) as opposed to the optimal case of a depth corresponding to O(log(n)).
> Notice the correlation between depth and amount of time it takes to search a BST?
<div style="width:100%; margin:auto;text-align:center;"> <img src="https://www.devmaking.com/img/topics/datastructures/BST_03.png" alt="best case performance: balanced tree." style="width:500px;max-width:95%;"> </div>
<div style="width:100%; margin:auto;text-align:center;"> <img src="https://www.devmaking.com/img/topics/datastructures/BST_04.png" alt="worst-case performance: linear tree" style="width:500px;max-width:95%;"> </div>
<div style="width:100%; margin:auto;text-align:center;"> <img src="https://www.devmaking.com/img/topics/datastructures/BST_05.png" alt="average case performance: mostly balanced tree." style="width:500px;max-width:95%;"> </div>
Resolving this possible worst case
There are BST-like data structures that provide methods of re-balancing themselves in order to always maintain as close to O(log(n)) search speeds as possible such as AVL Trees and red-black Trees.
When to use a BST
Primarily, a BST will be most efficient to implement when you expect to be searching and retrieving elements from a structure often and want an average case of O(log(n)) time to complete the operation as opposed to O(n) that linked lists offer.
Further Resources
<a href="https://www.amazon.com/Algorithms-4th-Robert-Sedgewick/dp/032157351X/ref=as_li_ss_tl?_encoding=UTF8&pd_rd_i=032157351X&pd_rd_r=37c135e7-c5c4-4ed1-94ab-c2aaec60d692&pd_rd_w=PEuac&pd_rd_wg=xSJkC&pf_rd_p=1c11b7ff-9ffb-4ba6-8036-be1b0afa79bb&pf_rd_r=H3XK6W265MFMNYZ7PV3Z&psc=1&refRID=H3XK6W265MFMNYZ7PV3Z&linkCode=ll1&tag=devmaking-20&linkId=9f9600161b287607f80038c5fd85b020&language=en_US" target="_blank" style="color:#fff;border-radius:3px;background-color:#888;padding:1px 5px">Algorithms (Sedgwick)</a> An excellent resource and reference for learning algorithms in computer science.
<a href="https://www.amazon.com/Algorithms-Sanjoy-Dasgupta-ebook-dp-B006Z0QR3I/dp/B006Z0QR3I/ref=as_li_ss_tl?_encoding=UTF8&me=&qid=1566246918&linkCode=ll1&tag=devmaking-20&linkId=26b190aacb9c02d1a0815d4066f9d80e&language=en_US" target="_blank" style="color:#fff;border-radius:3px;background-color:#888;padding:1px 5px">Algorithms (Dasgupta)</a> Concise and clear overview on various types and designs of algorithms in a digestible format.
<a href="https://www.amazon.com/Introduction-Algorithms-3rd-MIT-Press/dp/0262033844/ref=as_li_ss_tl?ie=UTF8&linkCode=ll1&tag=devmaking-20&linkId=1dd5be6066b180c12926dc03e2659215&language=en_US" target="_blank" style="color:#fff;border-radius:3px;background-color:#888;padding:1px 5px">Introduction to Algorithms (CLRS)</a> An in depth and comprehensively detailed explanation of algorithms and how to design them.