Counting Sort

counting-sort: The Ultimate Cheatsheet

1. Detailed Explanation

Counting sort is a non-comparison-based sorting algorithm. Unlike algorithms like merge sort or quicksort, it operates by counting the occurrences of each unique element in the input array. This count information is then used to determine the final sorted position of each element. It’s particularly efficient for sorting integers or other data that can be mapped to integers within a known range.

Why is it important? Counting sort excels when dealing with data with a relatively small range of values. Its linear time complexity makes it significantly faster than comparison-based sorts (O(n log n)) in these scenarios. It’s frequently used as a subroutine in other algorithms, such as radix sort.

Core Concepts:

Input: An array of integers (or data that can be mapped to integers).
Range: The maximum and minimum values within the input array. Knowing this range is crucial for counting sort’s efficiency.
Counting Array: An auxiliary array used to store the counts of each element. The size of this array is determined by the range of input values.
Cumulative Count: The counting array is often modified to store cumulative counts, making it easier to determine the final position of each element in the sorted array.

2. When to Use counting-sort (and When Not To)

Use counting sort when:

The input consists of integers or data that can be easily mapped to integers.
The range of input values is relatively small compared to the number of elements.
You need a linear time sorting algorithm.

Don’t use counting sort when:

The input contains a very large range of values. The counting array will become excessively large, negating the performance benefits.
The input contains floating-point numbers or other data types that cannot be easily mapped to integers within a reasonable range.
The input data is already nearly sorted; a simpler algorithm like insertion sort might be more efficient.

3. Core Algorithm / Data Structure Template

Find the range: Determine the minimum and maximum values in the input array.
Initialize the counting array: Create an array of size (max - min + 1) and initialize all elements to 0.
Count occurrences: Iterate through the input array, incrementing the count in the counting array for each element encountered.
Calculate cumulative counts: Iterate through the counting array, replacing each element with the sum of itself and the preceding element. This creates the cumulative count.
Place elements in sorted array: Iterate through the input array in reverse order. For each element, use its cumulative count to determine its position in the sorted array, decrement the cumulative count, and place the element.

4. Code Implementations

Python

def counting_sort(arr):
    """Sorts an array of integers using counting sort."""
    if not arr:  #Handle empty array case
        return arr

    min_val = min(arr)
    max_val = max(arr)
    range_size = max_val - min_val + 1
    count_array = [0] * range_size

    # Count occurrences
    for num in arr:
        count_array[num - min_val] += 1

    # Calculate cumulative counts
    for i in range(1, range_size):
        count_array[i] += count_array[i - 1]

    # Place elements in sorted array
    sorted_array = [0] * len(arr)
    for num in reversed(arr):
        index = count_array[num - min_val] -1
        sorted_array[index] = num
        count_array[num - min_val] -= 1

    return sorted_array

Java

import java.util.Arrays;

class CountingSort {
    public static int[] countingSort(int[] arr) {
        if (arr == null || arr.length == 0) return arr;

        int min = Arrays.stream(arr).min().getAsInt();
        int max = Arrays.stream(arr).max().getAsInt();
        int range = max - min + 1;
        int[] countArray = new int[range];

        for (int num : arr) {
            countArray[num - min]++;
        }

        for (int i = 1; i < range; i++) {
            countArray[i] += countArray[i - 1];
        }

        int[] sortedArray = new int[arr.length];
        for (int i = arr.length - 1; i >= 0; i--) {
            int num = arr[i];
            int index = countArray[num - min] - 1;
            sortedArray[index] = num;
            countArray[num - min]--;
        }
        return sortedArray;
    }
}

C++

#include <iostream>
#include <vector>
#include <algorithm>

std::vector<int> countingSort(std::vector<int>& arr) {
    if (arr.empty()) return arr;

    int min_val = *std::min_element(arr.begin(), arr.end());
    int max_val = *std::max_element(arr.begin(), arr.end());
    int range_size = max_val - min_val + 1;
    std::vector<int> count_array(range_size, 0);

    for (int num : arr) {
        count_array[num - min_val]++;
    }

    for (size_t i = 1; i < count_array.size(); ++i) {
        count_array[i] += count_array[i - 1];
    }

    std::vector<int> sorted_array(arr.size());
    for (int i = arr.size() - 1; i >= 0; --i) {
        int num = arr[i];
        int index = count_array[num - min_val] - 1;
        sorted_array[index] = num;
        count_array[num - min_val]--;
    }
    return sorted_array;
}

5. Complexity Analysis

Operation	Best Case	Average Case	Worst Case	Space Complexity
Counting Sort	O(n+k)	O(n+k)	O(n+k)	O(k)

Where:

n is the number of elements in the input array.
k is the range of input values (max - min + 1).

6. Pro Tips, Tricks, and Common Pitfalls

Handling negative numbers: If your input contains negative numbers, you’ll need to adjust the indexing in the counting array to accommodate the negative values. One approach is to find the minimum value and shift all values to be non-negative before sorting.
Memory usage: Be mindful of the space complexity. For very large ranges, the counting array can consume a significant amount of memory.
In-place sorting: Counting sort is not typically performed in-place; it requires an auxiliary counting array.
Stable sort: Counting sort can be implemented as a stable sort, meaning the relative order of equal elements is preserved. The implementations above maintain stability.

7. Classic Problem Examples

Example: H-Index

Description: Given an array of integers citations where citations[i] is the number of citations a researcher received for their ith paper, return the researcher’s h-index. The h-index is the maximum value of h such that the given researcher has published at least h papers that have each been cited at least h times.

High-level Approach using Counting Sort:

Determine the range: Find the maximum citation count (this will be our k).
Create a counting array: Create a counting array of size k+1 to store the counts of papers with each citation count.
Count citations: Iterate through the citations array and increment the appropriate count in the counting array.
Iterate and find h-index: Iterate through the counting array from the end. The first index i where the cumulative count is greater than or equal to i represents the h-index.

While counting sort isn’t directly used to sort the citations array itself, understanding its principles helps in efficiently counting the citations to compute the h-index in linear time. A more straightforward approach would involve sorting the array first (using a more general sorting algorithm if the range is large) and then iterating to find the h-index. However, using a counting array directly avoids the overhead of a full sort when the range of citations is relatively small.