51_Numpy_For_Numerical_Computing
Category: AI & Data Science Tools
Type: AI/ML Tool or Library
Generated on: 2025-08-26 11:07:23
For: Data Science, Machine Learning & Technical Interviews
NumPy Cheatsheet for Numerical Computing
Section titled “NumPy Cheatsheet for Numerical Computing”1. Tool/Library Overview
- What it is: NumPy (Numerical Python) is the fundamental package for numerical computation in Python. It provides support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays.
- Main Use Cases in AI/ML:
- Data Representation: Storing and manipulating numerical data (e.g., images, audio, numerical features).
- Linear Algebra: Performing matrix operations essential for machine learning algorithms (e.g., matrix multiplication, eigenvalue decomposition).
- Random Number Generation: Generating random numbers for initializing weights, sampling data, and creating synthetic datasets.
- Data Preprocessing: Normalizing, standardizing, and transforming data.
- Scientific Computing: Implementing numerical methods for optimization, integration, and other scientific tasks.
2. Installation & Setup
- Installation:
Terminal window pip install numpy# orconda install numpy - Import:
import numpy as np # Standard practice to use 'np' alias
- Verification:
print(np.__version__)# Expected output: (e.g., 1.26.2)
3. Core Features & API
-
NumPy Array (ndarray): The core data structure.
-
Creating Arrays:
np.array(list_of_lists): Creates an array from a list or tuple.np.zeros((rows, cols)): Creates an array filled with zeros.np.ones((rows, cols)): Creates an array filled with ones.np.full((rows, cols), value): Creates an array filled with a specific value.np.eye(n): Creates an identity matrix.np.arange(start, stop, step): Creates an array with evenly spaced values within a given interval.np.linspace(start, stop, num): Creates an array with evenly spaced values, specifying the number of values.np.random.rand(rows, cols): Creates an array with random values between 0 and 1.np.random.randn(rows, cols): Creates an array with random values from a standard normal distribution.np.random.randint(low, high, size): Creates an array of random integers.
arr = np.array([[1, 2, 3], [4, 5, 6]])zeros_arr = np.zeros((2, 3))ones_arr = np.ones((3, 2))range_arr = np.arange(0, 10, 2)linspace_arr = np.linspace(0, 1, 5)rand_arr = np.random.rand(2, 2)randint_arr = np.random.randint(1, 10, (3,3))print("Array:\n", arr)print("Zeros:\n", zeros_arr)print("Ones:\n", ones_arr)print("Range:\n", range_arr)print("Linspace:\n", linspace_arr)print("Random:\n", rand_arr)print("Random Int:\n", randint_arr) -
Array Attributes:
arr.shape: Returns the dimensions of the array (rows, columns, …).arr.ndim: Returns the number of dimensions.arr.dtype: Returns the data type of the array elements.arr.size: Returns the total number of elements in the array.arr.itemsize: Returns the size (in bytes) of each element.arr.nbytes: Returns the total size (in bytes) of the array.
arr = np.array([[1, 2, 3], [4, 5, 6]])print("Shape:", arr.shape) # (2, 3)print("Dimension:", arr.ndim) # 2print("Data Type:", arr.dtype) # int64 (or int32 depending on system)print("Size:", arr.size) # 6 -
Array Indexing & Slicing: Similar to Python lists, but more powerful.
arr[row, col]: Accessing a single element.arr[start:stop:step, start:stop:step]: Slicing a portion of the array.arr[row_indices, col_indices]: Fancy indexing (using arrays of indices).arr[arr > 5]: Boolean indexing (filtering elements based on a condition).
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])print("Element at (0, 1):", arr[0, 1]) # 2print("First row:", arr[0, :]) # [1 2 3]print("First column:", arr[:, 0]) # [1 4 7]print("Sub-array:\n", arr[0:2, 1:3]) # [[2 3][5 6]]print("Elements greater than 5:", arr[arr > 5]) # [6 7 8 9] -
Array Reshaping:
arr.reshape(new_shape): Changes the shape of the array without changing its data.arr.ravel(): Flattens the array into a 1D array.arr.T: Transposes the array.
arr = np.arange(12)reshaped_arr = arr.reshape(3, 4)print("Reshaped array:\n", reshaped_arr)print("Flattened array:", reshaped_arr.ravel())print("Transposed array:\n", reshaped_arr.T) -
Array Operations:
- Element-wise operations (+, -, *, /, **)
np.add(arr1, arr2): Element-wise additionnp.subtract(arr1, arr2): Element-wise subtractionnp.multiply(arr1, arr2): Element-wise multiplicationnp.divide(arr1, arr2): Element-wise divisionnp.power(arr1, arr2): Element-wise exponentiationnp.sqrt(arr): Element-wise square rootnp.exp(arr): Element-wise exponentialnp.log(arr): Element-wise natural logarithmnp.sin(arr),np.cos(arr),np.tan(arr): Element-wise trigonometric functionsarr.sum(),arr.mean(),arr.std(),arr.min(),arr.max(): Aggregate functionsarr.sum(axis=0): Sum along columnsarr.sum(axis=1): Sum along rows
arr1 = np.array([[1, 2], [3, 4]])arr2 = np.array([[5, 6], [7, 8]])print("Element-wise addition:\n", arr1 + arr2)print("Element-wise multiplication:\n", arr1 * arr2)print("Sum of all elements:", arr1.sum())print("Mean of all elements:", arr1.mean())print("Sum along columns:", arr1.sum(axis=0)) -
Linear Algebra:
np.dot(arr1, arr2): Matrix multiplication (dot product).arr1 @ arr2: Another way to perform matrix multiplication (Python >= 3.5).np.linalg.det(arr): Determinant of a matrix.np.linalg.inv(arr): Inverse of a matrix.np.linalg.eig(arr): Eigenvalues and eigenvectors of a matrix.np.linalg.solve(A, b): Solves the linear equation Ax = b.
arr1 = np.array([[1, 2], [3, 4]])arr2 = np.array([[5, 6], [7, 8]])print("Matrix multiplication:\n", np.dot(arr1, arr2))print("Determinant of arr1:", np.linalg.det(arr1))try:inv_arr1 = np.linalg.inv(arr1)print("Inverse of arr1:\n", inv_arr1)except np.linalg.LinAlgError:print("Matrix is singular and cannot be inverted.")eigenvalues, eigenvectors = np.linalg.eig(arr1)print("Eigenvalues:", eigenvalues)print("Eigenvectors:\n", eigenvectors)A = np.array([[2, 1], [1, 3]])b = np.array([4, 7])x = np.linalg.solve(A, b)print("Solution to Ax = b:", x) -
Broadcasting: NumPy’s ability to perform operations on arrays with different shapes. The smaller array is “broadcast” across the larger array.
arr1 = np.array([1, 2, 3])scalar = 5print("Broadcasting addition:", arr1 + scalar) # [6 7 8]arr2 = np.array([[1, 2, 3], [4, 5, 6]])arr3 = np.array([10, 20, 30])print("Broadcasting addition (array + row):")print(arr2 + arr3)# Output:# [[11 22 33]# [14 25 36]]
-
4. Practical Examples
-
Image Processing:
import matplotlib.pyplot as pltfrom PIL import Image# Load an image using Pillowtry:img = Image.open("image.jpg") # Replace with your image pathimg_array = np.array(img) # Convert to NumPy arrayprint("Image array shape:", img_array.shape) # (height, width, channels) - typically (H, W, 3) for RGB# Example: Invert the color of the imageinverted_img = 255 - img_array# Display the original and inverted images (using matplotlib)plt.figure(figsize=(10, 5))plt.subplot(1, 2, 1)plt.imshow(img_array)plt.title("Original Image")plt.axis('off') # Hide the axisplt.subplot(1, 2, 2)plt.imshow(inverted_img)plt.title("Inverted Image")plt.axis('off')plt.show()except FileNotFoundError:print("Error: Image file not found. Make sure 'image.jpg' exists in the same directory or specify the correct path.")except Exception as e:print(f"An error occurred: {e}") -
Data Normalization/Standardization:
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])# Normalization (scaling to the range [0, 1])data_normalized = (data - data.min()) / (data.max() - data.min())print("Normalized data:\n", data_normalized)# Standardization (scaling to have zero mean and unit variance)data_standardized = (data - data.mean()) / data.std()print("Standardized data:\n", data_standardized) -
Implementing a Simple Linear Regression:
# Generate some synthetic dataX = np.linspace(0, 10, 100)y = 2 * X + 1 + np.random.randn(100) * 2 # y = 2x + 1 + noise# Reshape X to a column vector (required for linear algebra operations)X = X.reshape(-1, 1)# Add a bias (intercept) term to XX = np.concatenate((np.ones_like(X), X), axis=1)# Solve for the coefficients using the normal equation# theta = (X.T @ X)^-1 @ X.T @ ytheta = np.linalg.inv(X.T @ X) @ X.T @ yprint("Estimated coefficients (intercept, slope):", theta)# Make predictionsX_test = np.array([[1, 5], [1, 8]]) # Bias term addedy_pred = X_test @ thetaprint("Predictions for X_test:\n", y_pred)# Plot the data and the regression line (using Matplotlib)import matplotlib.pyplot as pltplt.scatter(X[:, 1], y) # Scatter plot of original dataplt.plot(X[:, 1], X @ theta, color='red') # Regression lineplt.xlabel("X")plt.ylabel("y")plt.title("Linear Regression")plt.show()
5. Advanced Usage
-
Memory Mapping: Working with large datasets that don’t fit in memory.
# Create a large array and save it to disklarge_array = np.arange(100000000, dtype='float32')np.save('large_array.npy', large_array)# Memory map the arraymmapped_array = np.load('large_array.npy', mmap_mode='r')print("Shape of memory-mapped array:", mmapped_array.shape)# Accessing elements is still possible, but modifications are not allowed in 'r' (read-only) mode.print("First 10 elements:", mmapped_array[:10])# To modify, use 'r+' mode:# mmapped_array = np.load('large_array.npy', mmap_mode='r+')# mmapped_array[0] = 100 # This will modify the file on disk# Important: Close the file when donedel mmapped_array # or del large_array if you want to free up the memory immediately -
Vectorization: Leveraging NumPy’s optimized C implementation for faster computations by avoiding explicit loops.
# Inefficient (using a loop):def compute_reciprocal_loop(arr):result = np.empty(len(arr))for i in range(len(arr)):result[i] = 1.0 / arr[i]return result# Efficient (using vectorization):def compute_reciprocal_vectorized(arr):return 1.0 / arrarr = np.random.rand(1000000)# Time the loop-based approachimport timestart_time = time.time()compute_reciprocal_loop(arr)loop_time = time.time() - start_timeprint(f"Loop-based time: {loop_time:.4f} seconds")# Time the vectorized approachstart_time = time.time()compute_reciprocal_vectorized(arr)vectorized_time = time.time() - start_timeprint(f"Vectorized time: {vectorized_time:.4f} seconds")print(f"Vectorization speedup: {loop_time / vectorized_time:.2f}x") # Usually a significant speedup -
Structured Arrays: Arrays with heterogeneous data types.
data = np.zeros(4, dtype={'names': ('name', 'age', 'weight'),'formats': ('U10', 'i4', 'f8')}) # Unicode string, integer, floatnames = ['Alice', 'Bob', 'Charlie', 'David']ages = [25, 30, 22, 28]weights = [65.5, 72.0, 58.3, 78.9]data['name'] = namesdata['age'] = agesdata['weight'] = weightsprint("Structured array:\n", data)print("Names:", data['name'])print("Age of Alice:", data[data['name'] == 'Alice']['age'])
6. Tips & Tricks
- Avoid Loops: Use vectorized operations whenever possible for performance.
- Use Views Wisely: Slicing creates views (no data copying). Use
arr.copy()to create a true copy. - Check Data Types: Ensure your data types are appropriate for the operations you’re performing. Use
arr.astype()to change the data type. - Use
np.where()for Conditional Operations:np.where(condition, x, y)returns elements chosen fromxorydepending oncondition. More efficient than manual looping. - Use
np.allclose()for Floating-Point Comparisons: Due to floating-point precision issues, usenp.allclose(a, b, rtol=1e-05, atol=1e-08)instead ofa == bfor comparing arrays. - Use
np.random.seed()for Reproducibility: Set the seed before generating random numbers to ensure consistent results. E.g.,np.random.seed(42) - Use
np.clip()to Limit Values:np.clip(arr, min_value, max_value)clamps values inarrto the specified range.
7. Integration
-
Pandas:
import pandas as pd# Create a NumPy arrayarr = np.random.rand(5, 3)# Create a Pandas DataFrame from the NumPy arraydf = pd.DataFrame(arr, columns=['col1', 'col2', 'col3'])print("DataFrame from NumPy array:\n", df)# Convert a Pandas Series to a NumPy arrayseries = df['col1']arr_from_series = series.to_numpy() # or series.values (deprecated)print("NumPy array from Pandas Series:", arr_from_series)# Apply a NumPy function to a Pandas Seriesdf['col1_sqrt'] = np.sqrt(df['col1'])print("DataFrame with square root of col1:\n", df) -
Matplotlib:
import matplotlib.pyplot as plt# Generate data using NumPyx = np.linspace(0, 10, 100)y = np.sin(x)# Create a plotplt.plot(x, y)plt.xlabel("x")plt.ylabel("sin(x)")plt.title("Sine Wave")plt.show()# Create a scatter plotx = np.random.rand(50)y = np.random.rand(50)colors = np.random.rand(50)sizes = np.random.rand(50) * 100plt.scatter(x, y, c=colors, s=sizes, alpha=0.5) # alpha for transparencyplt.xlabel("x")plt.ylabel("y")plt.title("Scatter Plot")plt.colorbar()plt.show() -
Scikit-learn: NumPy arrays are the standard input for scikit-learn models. Data preprocessing steps from scikit-learn (scalers, etc.) often return NumPy arrays.
from sklearn.preprocessing import StandardScalerfrom sklearn.linear_model import LogisticRegressionfrom sklearn.model_selection import train_test_split# Sample data (NumPy array)X = np.array([[1, 2], [1.5, 1.8], [5, 8], [8, 8], [1, 0.6], [9, 11]])y = np.array([0, 0, 1, 1, 0, 1])# Split data into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)# Scale the datascaler = StandardScaler()X_train_scaled = scaler.fit_transform(X_train)X_test_scaled = scaler.transform(X_test)# Train a Logistic Regression modelmodel = LogisticRegression()model.fit(X_train_scaled, y_train)# Make predictionsy_pred = model.predict(X_test_scaled)print("Predictions:", y_pred)# Evaluate the model (using NumPy for calculations)accuracy = np.mean(y_pred == y_test)print("Accuracy:", accuracy)
8. Further Resources
- Official NumPy Documentation: https://numpy.org/doc/
- NumPy Tutorials: https://numpy.org/devdocs/user/tutorial.html
- SciPy Lecture Notes: https://scipy-lectures.org/ (Excellent resource covering NumPy and other scientific Python libraries)
- Stack Overflow: Search for NumPy-related questions and answers.
- Real Python NumPy Tutorials: https://realpython.com/numpy-tutorial/