Ordered Set

ordered-set: The Ultimate Cheatsheet

1. Detailed Explanation

What is ordered-set? An ordered-set (also known as an ordered set, sorted set, or sometimes implicitly implemented using a self-balancing binary search tree) is a data structure that stores a collection of unique elements, like a regular set, but maintains the elements in a specific order, typically ascending or descending. Unlike regular sets, the order of insertion is irrelevant; the set always maintains its sorted order. It allows for efficient retrieval of elements based on their rank or position within the sorted set.
Why is it important and what kind of problems does it solve? Ordered sets are crucial when you need to perform operations that depend on the sorted order of unique elements. They are vital for:
- Range queries: Finding elements within a specific range of values.
- Rank-based retrieval: Finding the k-th smallest or largest element.
- Lower bound/Upper bound operations: Finding the first element greater than or equal to (lower bound) or strictly greater than (upper bound) a given value.
- Maintaining sorted lists of unique elements: When duplicates are not allowed, and sorted order is essential.
- Solving problems involving sliding windows: Identifying the k-th largest/smallest elements in a sliding window.
Core concepts, underlying principles, and key terminology.
- Uniqueness: Elements within the set must be unique.
- Sorted Order: Elements are maintained in sorted order (usually ascending).
- Lower Bound: The smallest element in the set that is greater than or equal to a given value.
- Upper Bound: The smallest element in the set that is strictly greater than a given value.
- Rank: The position of an element in the sorted set (starting from 0 or 1, depending on the implementation).
- Self-Balancing Trees: Ordered sets are often implemented using self-balancing binary search trees (e.g., AVL trees, Red-Black trees) to ensure efficient logarithmic time complexity for most operations.
- Binary Search: Crucial for performing operations like lower bound, upper bound, and finding elements in logarithmic time.

2. When to Use ordered-set (and When Not To)

Identify problem patterns that suggest ordered-set is a good fit.
- The problem requires maintaining a sorted collection of unique elements.
- Frequent range queries (e.g., finding the number of elements within a range).
- Need to find the k-th smallest/largest element efficiently.
- Requirement for lower bound or upper bound operations.
- Problems involving sliding windows where you need to track the sorted order of elements.
- When the problem involves operations that benefit from the sorted order of elements, such as finding the median or quantiles.
Discuss scenarios where a different data structure or algorithm would be more appropriate.
- Unordered elements and uniqueness not required: Use a list or array.
- Unordered elements and uniqueness required: Use a regular set (e.g., HashSet in Java, unordered_set in C++, set in Python).
- Frequent insertions/deletions at arbitrary positions, and order is not critical: Consider a linked list or a skip list.
- Limited range of element values: If the elements are integers within a small range, a boolean array or a frequency array might be more efficient.
- No need for uniqueness, but sorted order needed: Use a sorted list, potentially with duplicates.
- Extremely large dataset: Disk-based data structures or specialized indexing techniques may be more efficient than keeping the entire dataset in memory.

3. Core Algorithm / Data Structure Template

Here’s a general template for approaching problems using an ordered-set:

Initialization: Create an ordered-set data structure (e.g., using a self-balancing BST implementation or a library that provides ordered-set functionality).
Insertion: Insert the necessary elements into the ordered-set. Ensure duplicates are handled appropriately (they should typically be ignored or replaced).
Query/Operation: Perform the desired operation(s):
- Range Query: Use lower bound and upper bound to find elements within a specific range. Iterate between the lower and upper bounds.
- Rank Retrieval: Use methods to find the rank (index) of an element.
- Lower/Upper Bound: Use the data structure’s built-in lower bound/upper bound functions.
- k-th element: Iterate from the beginning of the ordered set to the k-th position.
Deletion (if needed): Remove elements from the ordered-set as required by the problem.

4. Code Implementations (Python, Java, C++)

Since Python doesn’t have a built-in ordered-set, we’ll use the sortedcontainers library which provides an SortedSet implementation. For Java and C++, we’ll simulate the ordered-set using a TreeSet and std::set respectively.

Python

from sortedcontainers import SortedSet

# Example usage of SortedSet from sortedcontainers
s = SortedSet()

# Insertion
s.add(5)
s.add(2)
s.add(8)
s.add(2)  # Duplicate, will be ignored

print(f"SortedSet: {s}")  # Output: SortedSet([2, 5, 8])

# Lower bound
lower_bound = s.bisect_left(5) # Index of the first element >= 5
print(f"Lower bound of 5: {lower_bound}") # Output: 1

# Upper bound
upper_bound = s.bisect_right(5) # Index of the first element > 5
print(f"Upper bound of 5: {upper_bound}") # Output: 2

# Find the k-th smallest element
k = 1  # Find the 1st smallest (index 0)
if k < len(s):
    kth_smallest = s[k]
    print(f"The {k}-th smallest element is: {kth_smallest}") # Output: The 1-th smallest element is: 5

# Check if an element exists
element_exists = 5 in s
print(f"Does 5 exist in the set? {element_exists}") # Output: Does 5 exist in the set? True

# Removal
s.remove(5)
print(f"SortedSet after removing 5: {s}") # Output: SortedSet([2, 8])

try:
    s.remove(9) # Removing non-existent element raises KeyError
except KeyError:
    print("KeyError: Element not found in the set.")

Java

import java.util.TreeSet;

public class OrderedSetExample {
    public static void main(String[] args) {
        TreeSet<Integer> s = new TreeSet<>();

        // Insertion
        s.add(5);
        s.add(2);
        s.add(8);
        s.add(2); // Duplicate, will be ignored

        System.out.println("TreeSet: " + s); // Output: TreeSet: [2, 5, 8]

        // Lower bound (ceiling)
        Integer lowerBound = s.ceiling(5); // First element >= 5
        System.out.println("Lower bound of 5: " + lowerBound); // Output: Lower bound of 5: 5

        // Upper bound (higher)
        Integer upperBound = s.higher(5); // First element > 5
        System.out.println("Upper bound of 5: " + upperBound); // Output: Upper bound of 5: 8

        // Finding k-th smallest (requires iteration - not ideal for frequent k-th element lookups)
        int k = 1; // Get the 1st smallest (index 0)
        if (k < s.size()) {
            int i = 0;
            for (Integer element : s) {
                if (i == k) {
                    System.out.println("The " + k + "-th smallest element is: " + element); // Output: The 1-th smallest element is: 5
                    break;
                }
                i++;
            }
        }

        // Check if an element exists
        boolean elementExists = s.contains(5);
        System.out.println("Does 5 exist in the set? " + elementExists); // Output: Does 5 exist in the set? true

        // Removal
        s.remove(5);
        System.out.println("TreeSet after removing 5: " + s); // Output: TreeSet after removing 5: [2, 8]

        // Remove non-existent element - no exception thrown
        s.remove(9);
    }
}

C++

#include <iostream>
#include <set>

int main() {
    std::set<int> s;

    // Insertion
    s.insert(5);
    s.insert(2);
    s.insert(8);
    s.insert(2); // Duplicate, will be ignored

    std::cout << "Set: ";
    for (int x : s) {
        std::cout << x << " ";
    }
    std::cout << std::endl; // Output: Set: 2 5 8

    // Lower bound
    auto lowerBoundIt = s.lower_bound(5);
    if (lowerBoundIt != s.end()) {
        std::cout << "Lower bound of 5: " << *lowerBoundIt << std::endl; // Output: Lower bound of 5: 5
    }

    // Upper bound
    auto upperBoundIt = s.upper_bound(5);
    if (upperBoundIt != s.end()) {
        std::cout << "Upper bound of 5: " << *upperBoundIt << std::endl; // Output: Upper bound of 5: 8
    }

    // Finding k-th smallest (requires iteration - not ideal for frequent k-th element lookups)
    int k = 1; // Get the 1st smallest (index 0)
    if (k < s.size()) {
        auto it = s.begin();
        std::advance(it, k);
        std::cout << "The " << k << "-th smallest element is: " << *it << std::endl; // Output: The 1-th smallest element is: 5
    }

    // Check if an element exists
    bool elementExists = s.count(5) > 0;
    std::cout << "Does 5 exist in the set? " << elementExists << std::endl; // Output: Does 5 exist in the set? true

    // Removal
    s.erase(5);
    std::cout << "Set after removing 5: ";
    for (int x : s) {
        std::cout << x << " ";
    }
    std::cout << std::endl; // Output: Set after removing 5: 2 8

    // Remove non-existent element - no exception thrown
    s.erase(9);

    return 0;
}

5. Complexity Analysis

Operation	Time Complexity (Best)	Time Complexity (Average)	Time Complexity (Worst)	Space Complexity
Insertion	O(1)	O(log n)	O(log n)	O(1)
Deletion	O(1)	O(log n)	O(log n)	O(1)
Search	O(1)	O(log n)	O(log n)	O(1)
Lower Bound	O(1)	O(log n)	O(log n)	O(1)
Upper Bound	O(1)	O(log n)	O(log n)	O(1)
k-th element	O(1)	O(log n)	O(n) in Java/C++ (linear iteration) / O(log n) with order statistics tree	O(1)
Space	O(n)	O(n)	O(n)	O(n)

Explanation:
- Most operations (insertion, deletion, search, lower bound, upper bound) have a logarithmic time complexity (O(log n)) because they typically rely on the properties of self-balancing binary search trees.
- Best-case complexity for insertion/deletion can be O(1) if the element can be added/removed at the root, which is rare.
- Finding the k-th element requires iteration in standard Java and C++ TreeSet and std::set, resulting in O(n) average case. However, if an order statistic tree is used, this can be improved to O(log n). Python’s SortedSet offers O(log n) for k-th element access.
- Space complexity is O(n) because the ordered-set stores all n unique elements.

6. Pro Tips, Tricks, and Common Pitfalls

Pro Tips and Tricks:
- Custom Comparators: Use custom comparator functions (or Comparator objects in Java) to define the ordering of elements based on specific criteria. This is useful when the natural ordering of elements is not sufficient.
- Order Statistics Tree: For frequent k-th element lookups, consider using an order statistics tree (a self-balancing BST augmented with size information in each node) to achieve O(log n) time complexity.
- Pre-allocation: If you know the approximate size of the set beforehand, pre-allocate memory (if possible in your language/implementation) to avoid reallocations during insertion.
Common Pitfalls:
- Forgetting Uniqueness: Ensure that you handle duplicate elements appropriately. Ordered sets enforce uniqueness.
- Incorrect Comparator: If you use a custom comparator, make sure it defines a strict weak ordering (i.e., it is consistent, transitive, and irreflexive). An incorrect comparator can lead to incorrect sorting and unexpected behavior.
- Iteration during Modification: Avoid modifying the ordered-set (inserting or deleting elements) while iterating through it using iterators. This can invalidate the iterators and cause errors.
- k-th element complexity: Realize that repeatedly finding the k-th element in Java and C++ (using TreeSet and std::set respectively) requires iterating from the beginning, which is O(n). Consider using an order statistic tree for better performance.
- KeyError (Python): When removing an element in Python using s.remove(), a KeyError will be raised if the element doesn’t exist. Use s.discard() to avoid this exception.

7. Classic Problem Examples

Example: Contains Duplicate III

Description: You are given an integer array nums and two integers indexDiff and valueDiff. Find a pair of indices (i, j) such that:

abs(i - j) <= indexDiff, and
abs(nums[i] - nums[j]) <= valueDiff.

Return true if such a pair exists or false otherwise.

High-level Approach:

Sliding Window: Use a sliding window of size indexDiff.
Ordered Set: Maintain an ordered-set (SortedSet, TreeSet, or std::set) containing the numbers within the current window.
Iteration: For each element nums[i] in the array:
- Remove Oldest Element: If i > indexDiff, remove nums[i - indexDiff - 1] from the ordered-set.
- Find Potential Match: Use the ordered-set’s lower_bound (or ceiling in Java) method to find the smallest element greater than or equal to nums[i] - valueDiff.
- Check Condition: If such an element exists and its value is less than or equal to nums[i] + valueDiff, then a pair satisfying the conditions has been found. Return true.
- Add Current Element: Add nums[i] to the ordered-set.
No Match: If the loop completes without finding a matching pair, return false.

This approach efficiently checks for nearby almost duplicates within the specified index and value difference constraints, utilizing the ordered-set’s ability to maintain sorted order and perform efficient lower bound queries.