Binary Search

binary-search: The Ultimate Cheatsheet

1. Detailed Explanation

Binary search is an efficient algorithm for finding a target value within a sorted array or list. It works by repeatedly dividing the search interval in half. If the target value is less than the middle element, the search continues in the lower half; otherwise, it continues in the upper half. This process is repeated until the target value is found or the search interval is empty.

Why is it important? Binary search offers significantly faster search times compared to linear search, especially for large datasets. Its logarithmic time complexity makes it a crucial tool in various applications where efficient searching is paramount.

Core concepts:

Sorted Input: Binary search requires the input data to be sorted in ascending or descending order.
Divide and Conquer: The algorithm follows a divide-and-conquer strategy, recursively narrowing down the search space.
Midpoint Calculation: The algorithm repeatedly calculates the midpoint of the current search interval.
Comparison: The target value is compared to the midpoint element to determine which half to search next.
Base Cases: The algorithm terminates when the target is found or the search interval becomes empty.

2. When to Use binary-search (and When Not To)

Use Binary Search when:

You need to efficiently search for a specific value within a sorted array or list.
The data is already sorted or can be easily sorted without significant overhead.
You need a search algorithm with logarithmic time complexity.

Don’t use Binary Search when:

The data is unsorted. Sorting the data first would negate the performance benefits of binary search.
You need to perform frequent insertions or deletions. Binary search trees or other dynamic data structures would be more appropriate.
The dataset is extremely small. The overhead of implementing binary search might outweigh its benefits for tiny datasets.
You need to find the closest element to a target value (for this, consider a modified binary search or other techniques).

3. Core Algorithm / Data Structure Template

Pseudocode:

function binary_search(sorted_array, target):
  low = 0
  high = length(sorted_array) - 1

  while low <= high:
    mid = floor((low + high) / 2)  // Avoid overflow: (low + high) // 2

    if sorted_array[mid] == target:
      return mid // Target found at index mid
    else if sorted_array[mid] < target:
      low = mid + 1 // Search in the upper half
    else:
      high = mid - 1 // Search in the lower half

  return -1 // Target not found

4. Code Implementations

Python

def binary_search(arr, target):
    low = 0
    high = len(arr) - 1
    while low <= high:
        mid = (low + high) // 2  # Integer division
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            low = mid + 1
        else:
            high = mid - 1
    return -1  # Target not found

Java

class BinarySearch {
    public static int binarySearch(int[] arr, int target) {
        int low = 0;
        int high = arr.length - 1;
        while (low <= high) {
            int mid = low + (high - low) / 2; // Avoid overflow
            if (arr[mid] == target) {
                return mid;
            } else if (arr[mid] < target) {
                low = mid + 1;
            } else {
                high = mid - 1;
            }
        }
        return -1; // Target not found
    }
}

C++

#include <iostream>
#include <vector>

int binarySearch(const std::vector<int>& arr, int target) {
    int low = 0;
    int high = arr.size() - 1;
    while (low <= high) {
        int mid = low + (high - low) / 2; // Avoid overflow
        if (arr[mid] == target) {
            return mid;
        } else if (arr[mid] < target) {
            low = mid + 1;
        } else {
            high = mid - 1;
        }
    }
    return -1; // Target not found
}

int main() {
  std::vector<int> arr = {2, 5, 7, 8, 11, 12};
  int target = 11;
  int index = binarySearch(arr, target);
  std::cout << "Target found at index: " << index << std::endl;
  return 0;
}

5. Complexity Analysis

Operation	Time Complexity (Best/Average/Worst)	Space Complexity
Binary Search	O(1) / O(log n) / O(log n)	O(1)

Best Case: O(1) – The target element is found at the middle of the array in the first iteration.
Average Case: O(log n) – The target element is found after several iterations.
Worst Case: O(log n) – The target element is not found, or it’s at the end of the search.
Space Complexity: O(1) – Binary search uses a constant amount of extra space regardless of the input size.

6. Pro Tips, Tricks, and Common Pitfalls

Pro Tips:

Iterative vs. Recursive: Iterative implementations are generally preferred for binary search due to better performance and avoidance of potential stack overflow issues with very large arrays.
Overflow Prevention: Use low + (high - low) / 2 instead of (low + high) / 2 for midpoint calculation to avoid potential integer overflow issues when dealing with very large indices.
Handling duplicates: If there are multiple instances of the target value, a slightly modified binary search can find the first or last occurrence.

Common Pitfalls:

Off-by-one errors: Incorrectly setting the low or high index can lead to incorrect results or infinite loops. Carefully check the boundary conditions.
Incorrect comparison: Using the wrong comparison operator (<, >, <=, >=) in the if statements can result in incorrect results.
Forgetting to handle the case where the target is not found: Always return an appropriate value (e.g., -1) to indicate that the target was not found.
Using binary search on unsorted data: This will yield incorrect results.

7. Classic Problem Examples

Example: Median of Two Sorted Arrays

High-level approach:

This problem can be solved efficiently using a modified binary search. The core idea is to find a partition point in one array such that the elements to the left and right of the partition in both arrays satisfy the median condition. We can use binary search to efficiently search for this partition point, ensuring the overall time complexity remains O(log(m+n)). The details of the partitioning and median calculation are complex but leverage the sorted nature of the input arrays and binary search to achieve optimal efficiency. The solution involves iterative refinement of the partition point until the median condition is met.