Binary Search Tree

binary-search-tree: The Ultimate Cheatsheet

1. Detailed Explanation

A Binary Search Tree (BST) is a tree data structure where each node has at most two children, referred to as the left child and the right child. The key property of a BST is that for every node, the value of its left subtree’s nodes is less than the node’s value, and the value of its right subtree’s nodes is greater than the node’s value. This ordering property allows for efficient searching, insertion, and deletion of nodes.

Why is it important? BSTs are crucial because they provide efficient logarithmic time complexity (O(log n) on average) for many operations, making them significantly faster than linear-time operations (O(n)) for large datasets. This efficiency is particularly valuable when dealing with sorted or ordered data.

Core Concepts:

Node: A fundamental unit containing a key (value) and pointers to its left and right children.
Root: The topmost node in the tree.
Leaf Node: A node with no children.
Subtree: A portion of the tree rooted at a particular node.
Height: The maximum distance from the root to a leaf node.
Balanced Tree: A tree where the height is proportional to log₂(n), where n is the number of nodes. Balanced trees are crucial for maintaining optimal performance.
Unbalanced Tree: A tree where the height can approach n, leading to degraded performance.

2. When to Use binary-search-tree (and When Not To)

Use BSTs when:

You need to perform frequent searches, insertions, and deletions on ordered data.
You need to maintain a sorted collection of data.
You need to find the minimum or maximum element efficiently.
You require operations like finding the predecessor or successor of a node.

Don’t use BSTs when:

You need guaranteed logarithmic time complexity for all operations (consider self-balancing trees like AVL or Red-Black trees).
You need to perform frequent range queries (consider other structures like segment trees or ordered maps).
You need to handle a large number of concurrent operations without locking (consider concurrent data structures).
The data is highly dynamic and frequent rebalancing is undesirable (consider other structures like hash tables).

3. Core Algorithm / Data Structure Template

Insertion:

Start at the root.
If the tree is empty, create a new node as the root.
If the value to be inserted is less than the current node’s value, recursively insert it into the left subtree.
If the value to be inserted is greater than the current node’s value, recursively insert it into the right subtree.
Handle duplicate values appropriately (either ignore or allow duplicates based on requirements).

Search:

Start at the root.
If the tree is empty, the value is not found.
If the value is equal to the current node’s value, the value is found.
If the value is less than the current node’s value, recursively search the left subtree.
If the value is greater than the current node’s value, recursively search the right subtree.

4. Code Implementations

Python

class Node:
    def __init__(self, key):
        self.key = key
        self.left = None
        self.right = None

class BST:
    def __init__(self):
        self.root = None

    def insert(self, key):
        if self.root is None:
            self.root = Node(key)
        else:
            self._insert_recursive(self.root, key)

    def _insert_recursive(self, node, key):
        if key < node.key:
            if node.left is None:
                node.left = Node(key)
            else:
                self._insert_recursive(node.left, key)
        else:
            if node.right is None:
                node.right = Node(key)
            else:
                self._insert_recursive(node.right, key)

    def search(self, key):
        return self._search_recursive(self.root, key)

    def _search_recursive(self, node, key):
        if node is None or node.key == key:
            return node
        if key < node.key:
            return self._search_recursive(node.left, key)
        return self._search_recursive(node.right, key)

Java

class Node {
    int key;
    Node left, right;

    public Node(int item) {
        key = item;
        left = right = null;
    }
}

class BST {
    Node root;

    BST() {
        root = null;
    }

    void insert(int key) {
        root = insertRec(root, key);
    }

    Node insertRec(Node root, int key) {
        if (root == null) {
            root = new Node(key);
            return root;
        }

        if (key < root.key)
            root.left = insertRec(root.left, key);
        else if (key > root.key)
            root.right = insertRec(root.right, key);

        return root;
    }

    boolean search(int key) {
        return searchRec(root, key);
    }

    boolean searchRec(Node root, int key) {
        if (root == null) return false;
        if (root.key == key) return true;
        if (key < root.key) return searchRec(root.left, key);
        return searchRec(root.right, key);
    }
}

C++

#include <iostream>

struct Node {
    int key;
    Node *left, *right;

    Node(int k) : key(k), left(nullptr), right(nullptr) {}
};

class BST {
public:
    Node* root;
    BST() : root(nullptr) {}

    void insert(int key) {
        root = insertRec(root, key);
    }

    Node* insertRec(Node* root, int key) {
        if (root == nullptr) {
            return new Node(key);
        }

        if (key < root->key)
            root->left = insertRec(root->left, key);
        else if (key > root->key)
            root->right = insertRec(root->right, key);
        return root;
    }

    bool search(int key) {
        return searchRec(root, key);
    }

    bool searchRec(Node* root, int key) {
        if (root == nullptr) return false;
        if (root->key == key) return true;
        if (key < root->key) return searchRec(root->left, key);
        return searchRec(root->right, key);
    }
};

5. Complexity Analysis

Operation	Best Case	Average Case	Worst Case	Space Complexity
Search	O(1)	O(log n)	O(n)	O(n)
Insertion	O(1)	O(log n)	O(n)	O(n)
Deletion	O(1)	O(log n)	O(n)	O(n)
Minimum/Maximum	O(1)	O(log n)	O(n)	O(n)

The worst-case scenarios (O(n)) occur when the tree is highly unbalanced, resembling a linked list. Average and best-case scenarios assume a reasonably balanced tree.

6. Pro Tips, Tricks, and Common Pitfalls

Pro Tips:

Self-Balancing Trees: For guaranteed logarithmic time complexity, use self-balancing BSTs like AVL trees or Red-Black trees.
In-order Traversal: In-order traversal of a BST yields a sorted sequence of its nodes.
Space Optimization: Consider using techniques like Threaded Binary Trees to reduce space overhead if pointer space is a concern.

Common Pitfalls:

Off-by-one errors: Carefully handle boundary conditions (empty tree, single-node tree) during insertion, deletion, and search operations.
Unbalanced Trees: Avoid operations that consistently lead to unbalanced trees. Consider using self-balancing techniques if unbalanced trees are a concern.
Duplicate Handling: Clearly define how duplicate keys are handled (ignore, allow, or count).
Memory Leaks: Ensure proper memory management, especially when dealing with dynamic memory allocation in C++ or Java.

7. Classic Problem Examples

Example: Unique Binary Search Trees II

Description: Given an integer n, return all the structurally unique BST’s (binary search trees), which has exactly n nodes of unique values from 1 to n. Return the answer in any order.

High-Level Approach:

This problem can be solved using dynamic programming and recursion. For each number i from 1 to n, consider it as the root. The left subtree will contain nodes from 1 to i-1, and the right subtree will contain nodes from i+1 to n. Recursively generate all possible left and right subtrees and combine them with i as the root to create all unique BSTs. Memoization can be used to avoid redundant computations. The base cases are when n is 0 or 1, returning an empty list or a single-node tree, respectively.