51_Numpy_For_Numerical_Computing

Category: AI & Data Science Tools
Type: AI/ML Tool or Library
Generated on: 2025-08-26 11:07:23
For: Data Science, Machine Learning & Technical Interviews

NumPy Cheatsheet for Numerical Computing

1. Tool/Library Overview

What it is: NumPy (Numerical Python) is the fundamental package for numerical computation in Python. It provides support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays.
Main Use Cases in AI/ML:
- Data Representation: Storing and manipulating numerical data (e.g., images, audio, numerical features).
- Linear Algebra: Performing matrix operations essential for machine learning algorithms (e.g., matrix multiplication, eigenvalue decomposition).
- Random Number Generation: Generating random numbers for initializing weights, sampling data, and creating synthetic datasets.
- Data Preprocessing: Normalizing, standardizing, and transforming data.
- Scientific Computing: Implementing numerical methods for optimization, integration, and other scientific tasks.

2. Installation & Setup

Installation:

pip install numpy
# or
conda install numpy

Import:

import numpy as np  # Standard practice to use 'np' alias

Verification:

print(np.__version__)
# Expected output: (e.g., 1.26.2)

3. Core Features & API

NumPy Array (ndarray): The core data structure.
- Creating Arrays:
  - np.array(list_of_lists): Creates an array from a list or tuple.
  - np.zeros((rows, cols)): Creates an array filled with zeros.
  - np.ones((rows, cols)): Creates an array filled with ones.
  - np.full((rows, cols), value): Creates an array filled with a specific value.
  - np.eye(n): Creates an identity matrix.
  - np.arange(start, stop, step): Creates an array with evenly spaced values within a given interval.
  - np.linspace(start, stop, num): Creates an array with evenly spaced values, specifying the number of values.
  - np.random.rand(rows, cols): Creates an array with random values between 0 and 1.
  - np.random.randn(rows, cols): Creates an array with random values from a standard normal distribution.
  - np.random.randint(low, high, size): Creates an array of random integers.
```
arr = np.array([[1, 2, 3], [4, 5, 6]])
zeros_arr = np.zeros((2, 3))
ones_arr = np.ones((3, 2))
range_arr = np.arange(0, 10, 2)
linspace_arr = np.linspace(0, 1, 5)
rand_arr = np.random.rand(2, 2)
randint_arr = np.random.randint(1, 10, (3,3))

print("Array:\n", arr)
print("Zeros:\n", zeros_arr)
print("Ones:\n", ones_arr)
print("Range:\n", range_arr)
print("Linspace:\n", linspace_arr)
print("Random:\n", rand_arr)
print("Random Int:\n", randint_arr)
```
- Array Attributes:
  - arr.shape: Returns the dimensions of the array (rows, columns, …).
  - arr.ndim: Returns the number of dimensions.
  - arr.dtype: Returns the data type of the array elements.
  - arr.size: Returns the total number of elements in the array.
  - arr.itemsize: Returns the size (in bytes) of each element.
  - arr.nbytes: Returns the total size (in bytes) of the array.
```
arr = np.array([[1, 2, 3], [4, 5, 6]])
print("Shape:", arr.shape)  # (2, 3)
print("Dimension:", arr.ndim) # 2
print("Data Type:", arr.dtype) # int64 (or int32 depending on system)
print("Size:", arr.size) # 6
```
- Array Indexing & Slicing: Similar to Python lists, but more powerful.
  - arr[row, col]: Accessing a single element.
  - arr[start:stop:step, start:stop:step]: Slicing a portion of the array.
  - arr[row_indices, col_indices]: Fancy indexing (using arrays of indices).
  - arr[arr > 5]: Boolean indexing (filtering elements based on a condition).
```
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Element at (0, 1):", arr[0, 1])  # 2
print("First row:", arr[0, :]) # [1 2 3]
print("First column:", arr[:, 0]) # [1 4 7]
print("Sub-array:\n", arr[0:2, 1:3]) # [[2 3][5 6]]
print("Elements greater than 5:", arr[arr > 5]) # [6 7 8 9]
```
- Array Reshaping:
  - arr.reshape(new_shape): Changes the shape of the array without changing its data.
  - arr.ravel(): Flattens the array into a 1D array.
  - arr.T: Transposes the array.
```
arr = np.arange(12)
reshaped_arr = arr.reshape(3, 4)
print("Reshaped array:\n", reshaped_arr)
print("Flattened array:", reshaped_arr.ravel())
print("Transposed array:\n", reshaped_arr.T)
```
- Array Operations:
  - Element-wise operations (+, -, *, /, **)
  - np.add(arr1, arr2): Element-wise addition
  - np.subtract(arr1, arr2): Element-wise subtraction
  - np.multiply(arr1, arr2): Element-wise multiplication
  - np.divide(arr1, arr2): Element-wise division
  - np.power(arr1, arr2): Element-wise exponentiation
  - np.sqrt(arr): Element-wise square root
  - np.exp(arr): Element-wise exponential
  - np.log(arr): Element-wise natural logarithm
  - np.sin(arr), np.cos(arr), np.tan(arr): Element-wise trigonometric functions
  - arr.sum(), arr.mean(), arr.std(), arr.min(), arr.max(): Aggregate functions
  - arr.sum(axis=0): Sum along columns
  - arr.sum(axis=1): Sum along rows
```
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])

print("Element-wise addition:\n", arr1 + arr2)
print("Element-wise multiplication:\n", arr1 * arr2)
print("Sum of all elements:", arr1.sum())
print("Mean of all elements:", arr1.mean())
print("Sum along columns:", arr1.sum(axis=0))
```
- Linear Algebra:
  - np.dot(arr1, arr2): Matrix multiplication (dot product).
  - arr1 @ arr2: Another way to perform matrix multiplication (Python >= 3.5).
  - np.linalg.det(arr): Determinant of a matrix.
  - np.linalg.inv(arr): Inverse of a matrix.
  - np.linalg.eig(arr): Eigenvalues and eigenvectors of a matrix.
  - np.linalg.solve(A, b): Solves the linear equation Ax = b.
```
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])

print("Matrix multiplication:\n", np.dot(arr1, arr2))
print("Determinant of arr1:", np.linalg.det(arr1))

try:
    inv_arr1 = np.linalg.inv(arr1)
    print("Inverse of arr1:\n", inv_arr1)
except np.linalg.LinAlgError:
    print("Matrix is singular and cannot be inverted.")

eigenvalues, eigenvectors = np.linalg.eig(arr1)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:\n", eigenvectors)

A = np.array([[2, 1], [1, 3]])
b = np.array([4, 7])
x = np.linalg.solve(A, b)
print("Solution to Ax = b:", x)
```
- Broadcasting: NumPy’s ability to perform operations on arrays with different shapes. The smaller array is “broadcast” across the larger array.
```
arr1 = np.array([1, 2, 3])
scalar = 5
print("Broadcasting addition:", arr1 + scalar)  # [6 7 8]

arr2 = np.array([[1, 2, 3], [4, 5, 6]])
arr3 = np.array([10, 20, 30])
print("Broadcasting addition (array + row):")
print(arr2 + arr3)
# Output:
# [[11 22 33]
#  [14 25 36]]
```

4. Practical Examples

Image Processing:

import matplotlib.pyplot as plt
from PIL import Image

# Load an image using Pillow
try:
    img = Image.open("image.jpg")  # Replace with your image path
    img_array = np.array(img)  # Convert to NumPy array

    print("Image array shape:", img_array.shape) # (height, width, channels) - typically (H, W, 3) for RGB

    # Example: Invert the color of the image
    inverted_img = 255 - img_array

    # Display the original and inverted images (using matplotlib)
    plt.figure(figsize=(10, 5))

    plt.subplot(1, 2, 1)
    plt.imshow(img_array)
    plt.title("Original Image")
    plt.axis('off')  # Hide the axis

    plt.subplot(1, 2, 2)
    plt.imshow(inverted_img)
    plt.title("Inverted Image")
    plt.axis('off')

    plt.show()


except FileNotFoundError:
    print("Error: Image file not found.  Make sure 'image.jpg' exists in the same directory or specify the correct path.")
except Exception as e:
    print(f"An error occurred: {e}")

Data Normalization/Standardization:

data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Normalization (scaling to the range [0, 1])
data_normalized = (data - data.min()) / (data.max() - data.min())
print("Normalized data:\n", data_normalized)

# Standardization (scaling to have zero mean and unit variance)
data_standardized = (data - data.mean()) / data.std()
print("Standardized data:\n", data_standardized)

Implementing a Simple Linear Regression:

# Generate some synthetic data
X = np.linspace(0, 10, 100)
y = 2 * X + 1 + np.random.randn(100) * 2  # y = 2x + 1 + noise

# Reshape X to a column vector (required for linear algebra operations)
X = X.reshape(-1, 1)

# Add a bias (intercept) term to X
X = np.concatenate((np.ones_like(X), X), axis=1)

# Solve for the coefficients using the normal equation
# theta = (X.T @ X)^-1 @ X.T @ y
theta = np.linalg.inv(X.T @ X) @ X.T @ y

print("Estimated coefficients (intercept, slope):", theta)

# Make predictions
X_test = np.array([[1, 5], [1, 8]])  # Bias term added
y_pred = X_test @ theta
print("Predictions for X_test:\n", y_pred)

# Plot the data and the regression line (using Matplotlib)
import matplotlib.pyplot as plt
plt.scatter(X[:, 1], y)  # Scatter plot of original data
plt.plot(X[:, 1], X @ theta, color='red')  # Regression line
plt.xlabel("X")
plt.ylabel("y")
plt.title("Linear Regression")
plt.show()

5. Advanced Usage

Memory Mapping: Working with large datasets that don’t fit in memory.

# Create a large array and save it to disk
large_array = np.arange(100000000, dtype='float32')
np.save('large_array.npy', large_array)

# Memory map the array
mmapped_array = np.load('large_array.npy', mmap_mode='r')

print("Shape of memory-mapped array:", mmapped_array.shape)

# Accessing elements is still possible, but modifications are not allowed in 'r' (read-only) mode.
print("First 10 elements:", mmapped_array[:10])

# To modify, use 'r+' mode:
# mmapped_array = np.load('large_array.npy', mmap_mode='r+')
# mmapped_array[0] = 100  # This will modify the file on disk

# Important:  Close the file when done
del mmapped_array # or del large_array if you want to free up the memory immediately

Vectorization: Leveraging NumPy’s optimized C implementation for faster computations by avoiding explicit loops.

# Inefficient (using a loop):
def compute_reciprocal_loop(arr):
    result = np.empty(len(arr))
    for i in range(len(arr)):
        result[i] = 1.0 / arr[i]
    return result

# Efficient (using vectorization):
def compute_reciprocal_vectorized(arr):
    return 1.0 / arr

arr = np.random.rand(1000000)

# Time the loop-based approach
import time
start_time = time.time()
compute_reciprocal_loop(arr)
loop_time = time.time() - start_time
print(f"Loop-based time: {loop_time:.4f} seconds")

# Time the vectorized approach
start_time = time.time()
compute_reciprocal_vectorized(arr)
vectorized_time = time.time() - start_time
print(f"Vectorized time: {vectorized_time:.4f} seconds")

print(f"Vectorization speedup: {loop_time / vectorized_time:.2f}x") # Usually a significant speedup

Structured Arrays: Arrays with heterogeneous data types.

data = np.zeros(4, dtype={'names': ('name', 'age', 'weight'),
                          'formats': ('U10', 'i4', 'f8')})  # Unicode string, integer, float

names = ['Alice', 'Bob', 'Charlie', 'David']
ages = [25, 30, 22, 28]
weights = [65.5, 72.0, 58.3, 78.9]

data['name'] = names
data['age'] = ages
data['weight'] = weights

print("Structured array:\n", data)
print("Names:", data['name'])
print("Age of Alice:", data[data['name'] == 'Alice']['age'])

6. Tips & Tricks

Avoid Loops: Use vectorized operations whenever possible for performance.
Use Views Wisely: Slicing creates views (no data copying). Use arr.copy() to create a true copy.
Check Data Types: Ensure your data types are appropriate for the operations you’re performing. Use arr.astype() to change the data type.
Use np.where() for Conditional Operations: np.where(condition, x, y) returns elements chosen from x or y depending on condition. More efficient than manual looping.
Use np.allclose() for Floating-Point Comparisons: Due to floating-point precision issues, use np.allclose(a, b, rtol=1e-05, atol=1e-08) instead of a == b for comparing arrays.
Use np.random.seed() for Reproducibility: Set the seed before generating random numbers to ensure consistent results. E.g., np.random.seed(42)
Use np.clip() to Limit Values: np.clip(arr, min_value, max_value) clamps values in arr to the specified range.

7. Integration

Pandas:

import pandas as pd

# Create a NumPy array
arr = np.random.rand(5, 3)

# Create a Pandas DataFrame from the NumPy array
df = pd.DataFrame(arr, columns=['col1', 'col2', 'col3'])
print("DataFrame from NumPy array:\n", df)

# Convert a Pandas Series to a NumPy array
series = df['col1']
arr_from_series = series.to_numpy() # or series.values (deprecated)
print("NumPy array from Pandas Series:", arr_from_series)

# Apply a NumPy function to a Pandas Series
df['col1_sqrt'] = np.sqrt(df['col1'])
print("DataFrame with square root of col1:\n", df)

Matplotlib:

import matplotlib.pyplot as plt

# Generate data using NumPy
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Create a plot
plt.plot(x, y)
plt.xlabel("x")
plt.ylabel("sin(x)")
plt.title("Sine Wave")
plt.show()

# Create a scatter plot
x = np.random.rand(50)
y = np.random.rand(50)
colors = np.random.rand(50)
sizes = np.random.rand(50) * 100

plt.scatter(x, y, c=colors, s=sizes, alpha=0.5) # alpha for transparency
plt.xlabel("x")
plt.ylabel("y")
plt.title("Scatter Plot")
plt.colorbar()
plt.show()

Scikit-learn: NumPy arrays are the standard input for scikit-learn models. Data preprocessing steps from scikit-learn (scalers, etc.) often return NumPy arrays.

from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Sample data (NumPy array)
X = np.array([[1, 2], [1.5, 1.8], [5, 8], [8, 8], [1, 0.6], [9, 11]])
y = np.array([0, 0, 1, 1, 0, 1])

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Scale the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train a Logistic Regression model
model = LogisticRegression()
model.fit(X_train_scaled, y_train)

# Make predictions
y_pred = model.predict(X_test_scaled)
print("Predictions:", y_pred)

# Evaluate the model (using NumPy for calculations)
accuracy = np.mean(y_pred == y_test)
print("Accuracy:", accuracy)

8. Further Resources

Official NumPy Documentation: https://numpy.org/doc/
NumPy Tutorials: https://numpy.org/devdocs/user/tutorial.html
SciPy Lecture Notes: https://scipy-lectures.org/ (Excellent resource covering NumPy and other scientific Python libraries)
Stack Overflow: Search for NumPy-related questions and answers.
Real Python NumPy Tutorials: https://realpython.com/numpy-tutorial/