56_Pytorch_For_Deep_Learning

Category: AI & Data Science Tools
Type: AI/ML Tool or Library
Generated on: 2025-08-26 11:09:32
For: Data Science, Machine Learning & Technical Interviews

PyTorch for Deep Learning Cheatsheet

1. Tool/Library Overview

PyTorch is an open-source machine learning framework based on the Torch library, primarily developed by Facebook’s AI Research lab (FAIR). It’s used for a wide range of tasks, including:

Deep Learning: Building and training neural networks for various tasks like image recognition, natural language processing, and time series forecasting.
Research: Rapid prototyping and experimentation with new deep learning architectures and algorithms.
Production Deployment: Serving trained models in real-world applications.

Key strengths include:

Dynamic Computation Graph: Offers flexibility for debugging and experimentation.
Pythonic Interface: Easy to learn and use for Python developers.
GPU Acceleration: Supports efficient training on GPUs.
Large Community & Ecosystem: Extensive support, libraries, and pre-trained models.

2. Installation & Setup

# Install with conda (recommended)
conda install pytorch torchvision torchaudio -c pytorch

# Install with pip (CPU only)
pip install torch torchvision torchaudio

# Install with pip (CUDA 11.6) - Replace version if needed
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu116

Verification:

import torch
print(torch.__version__)

# Check GPU availability
print(torch.cuda.is_available())

if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("Using CPU")

# Expected Output (example):
# 2.0.1+cu118
# True
# Using GPU: NVIDIA GeForce RTX 3090

3. Core Features & API

3.1 Tensors

torch.Tensor: The fundamental data structure in PyTorch, similar to NumPy arrays.

import torch

# Creating tensors
x = torch.tensor([1, 2, 3])
y = torch.ones(5)       # Tensor of ones
z = torch.zeros((2, 3)) # Tensor of zeros
r = torch.rand((3, 2))  # Tensor of random numbers (uniform distribution)
n = torch.randn((3, 2)) # Tensor of random numbers (standard normal distribution)

# Tensor attributes
print(x.dtype)   # torch.int64 (default)
print(x.shape)   # torch.Size([3])
print(x.device)  # device(type='cpu')

# Tensor operations
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])
c = a + b
d = torch.dot(a, b) # Dot product
e = torch.matmul(r, n) # Matrix multiplication (if shapes compatible)

print(c) # tensor([5, 7, 9])
print(d) # tensor(32)
print(e.shape) # torch.Size([3, 2])

# Move tensors to GPU
if torch.cuda.is_available():
    device = torch.device('cuda')
    x = x.to(device) # Same as x = x.cuda()
    print(x.device)   # device(type='cuda', index=0)

Data Types: torch.float32 (default), torch.float64, torch.float16, torch.int64, torch.int32, torch.bool. Use .to(dtype) to cast.
NumPy Interoperability:

import numpy as np
numpy_array = np.array([1, 2, 3])
torch_tensor = torch.from_numpy(numpy_array) # Creates a tensor from a numpy array
back_to_numpy = torch_tensor.numpy()        # Converts a tensor to a numpy array

# Changes to one will affect the other if they share the same memory
torch_tensor[0] = 10
print(numpy_array) # [10  2  3]

3.2 Autograd (Automatic Differentiation)

torch.autograd: The engine that powers automatic differentiation in PyTorch. It allows you to compute gradients of tensors with respect to other tensors.

x = torch.tensor(2.0, requires_grad=True)  # Enable gradient tracking
y = x**2 + 2*x + 1
y.backward()        # Calculate gradients
print(x.grad)       # Gradient of y with respect to x (6.0)

# More complex example
a = torch.tensor([2.0, 3.0], requires_grad=True)
b = a * 2
c = b.mean()  # Mean of b
c.backward()  # Calculate gradients
print(a.grad) # tensor([1., 1.])  (d(c)/da = 1)

# Disable gradient tracking
with torch.no_grad():
    z = x**2 + 2*x + 1  # No gradient calculation

# Detach a tensor from the computation graph
detached_tensor = y.detach() # Creates a new tensor that shares the same data
                               # but does not require gradients.

3.3 Neural Network Module (`torch.nn`)

torch.nn.Module: Base class for all neural network modules.
torch.nn.Linear: Fully connected layer.
torch.nn.Conv2d: 2D convolutional layer.
torch.nn.MaxPool2d: 2D max pooling layer.
torch.nn.ReLU: Rectified Linear Unit activation function.
torch.nn.Sigmoid: Sigmoid activation function.
torch.nn.CrossEntropyLoss: Cross-entropy loss function.

import torch.nn as nn
import torch.nn.functional as F

class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

# Instantiate the model
input_size = 28*28 # MNIST image size
hidden_size = 500
num_classes = 10
model = SimpleNN(input_size, hidden_size, num_classes)

# Move model to GPU (if available)
if torch.cuda.is_available():
    model = model.to(device)


# Example usage (dummy input)
dummy_input = torch.randn(1, input_size).to(device) # Batch size 1
output = model(dummy_input)
print(output.shape) # torch.Size([1, 10])

3.4 Optimization (`torch.optim`)

torch.optim: Contains various optimization algorithms for training neural networks.
torch.optim.SGD: Stochastic Gradient Descent.
torch.optim.Adam: Adam optimizer (adaptive learning rate).

import torch.optim as optim

# Define optimizer
learning_rate = 0.001
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Loss function
criterion = nn.CrossEntropyLoss()

# Training loop (example)
num_epochs = 2
batch_size = 100
input_size = 28*28
num_classes = 10

# Dummy data
X_train = torch.randn(1000, input_size).to(device)
Y_train = torch.randint(0, num_classes, (1000,)).to(device)

for epoch in range(num_epochs):
    for i in range(0, len(X_train), batch_size):
        # Get batch
        inputs = X_train[i:i+batch_size]
        labels = Y_train[i:i+batch_size]

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        # Backward pass and optimization
        loss.backward()
        optimizer.step()

    print (f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

3.5 Dataset and DataLoader (`torch.utils.data`)

torch.utils.data.Dataset: Abstract class representing a dataset.
torch.utils.data.DataLoader: Provides an iterable over the dataset for batching and shuffling.

from torch.utils.data import Dataset, DataLoader

class CustomDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx], self.labels[idx]

# Example usage
data = torch.randn(100, 10) # 100 samples, 10 features
labels = torch.randint(0, 2, (100,))  # Binary classification labels

dataset = CustomDataset(data, labels)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True) #Shuffle the dataset

# Iterate through the dataloader
for batch_idx, (inputs, targets) in enumerate(dataloader):
    print(f"Batch {batch_idx}: Input shape = {inputs.shape}, Target shape = {targets.shape}")

3.6 Saving and Loading Models

# Save the model
torch.save(model.state_dict(), 'model.pth') # Saves only the model's parameters

# Load the model
loaded_model = SimpleNN(input_size, hidden_size, num_classes).to(device) # Create a new instance
loaded_model.load_state_dict(torch.load('model.pth')) # Load the parameters
loaded_model.eval() # Set the model to evaluation mode (important for inference)

4. Practical Examples

4.1 Image Classification with MNIST

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Define transformations
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))  # Normalize to [-1, 1]
])

# Load MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Create data loaders
batch_size = 64
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

# Define the neural network
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
        self.fc1 = nn.Linear(64 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(torch.relu(self.conv1(x)))
        x = self.pool(torch.relu(self.conv2(x)))
        x = torch.flatten(x, 1)  # flatten all dimensions except batch
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)

# Training loop
num_epochs = 2
for epoch in range(num_epochs):
    running_loss = 0.0
    for i, data in enumerate(train_loader, 0):
        inputs, labels = data[0].to(device), data[1].to(device)

        optimizer.zero_grad()

        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 200 == 199:    # print every 200 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 200:.3f}')
            running_loss = 0.0

print('Finished Training')

# Evaluate the model
correct = 0
total = 0
with torch.no_grad():
    for data in test_loader:
        images, labels = data[0].to(device), data[1].to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the 10000 test images: {100 * correct / total:.2f}%')

4.2 Sentiment Analysis with RNN

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np

# Sample data (replace with your actual dataset)
sentences = [
    "This movie is great!",
    "I hated this movie.",
    "The acting was terrible.",
    "I loved the story.",
    "It was a waste of time."
]
labels = [1, 0, 0, 1, 0]  # 1: positive, 0: negative

# Tokenization and vocabulary creation
vocab = set()
for sentence in sentences:
    for word in sentence.split():
        vocab.add(word)
vocab = list(vocab)
word_to_index = {word: i for i, word in enumerate(vocab)}
index_to_word = {i: word for i, word in enumerate(vocab)}

# Convert sentences to sequences of indices
def sentence_to_indices(sentence):
    return [word_to_index[word] for word in sentence.split()]

indexed_sentences = [sentence_to_indices(sentence) for sentence in sentences]

# Pad sequences to the same length
max_length = max(len(sentence) for sentence in indexed_sentences)
padded_sentences = [sentence + [0] * (max_length - len(sentence)) for sentence in indexed_sentences] # Padding with index 0

# Convert data to tensors
X = torch.tensor(padded_sentences)
Y = torch.tensor(labels)

# Define the dataset
class SentimentDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx], self.labels[idx]

dataset = SentimentDataset(X, Y)
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)

# Define the RNN model
class RNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.rnn = nn.RNN(embedding_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        embedded = self.embedding(x)
        output, hidden = self.rnn(embedded)
        return self.fc(output[:, -1, :])  # Take the last time step's output

# Instantiate the model
vocab_size = len(vocab)
embedding_dim = 50
hidden_dim = 100
output_dim = 1  # Binary classification
model = RNN(vocab_size, embedding_dim, hidden_dim, output_dim).to(device)

# Loss function and optimizer
criterion = nn.BCEWithLogitsLoss() # Binary cross-entropy with logits
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    for inputs, labels in dataloader:
        inputs, labels = inputs.to(device), labels.float().unsqueeze(1).to(device) # Add a dimension to the labels

        optimizer.zero_grad()

        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

    print(f'Epoch {epoch+1}, Loss: {loss.item():.4f}')

# Evaluation (on the training data for simplicity)
with torch.no_grad():
    correct = 0
    total = 0
    for inputs, labels in dataloader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = torch.sigmoid(model(inputs)) # Apply sigmoid to get probabilities
        predicted = (outputs > 0.5).float()  # Threshold at 0.5
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print(f'Accuracy: {100 * correct / total:.2f}%')

5. Advanced Usage

5.1 Custom Layers and Autograd Functions

import torch
from torch.autograd import Function

# Custom ReLU function with custom backward pass
class MyReLU(Function):
    @staticmethod
    def forward(ctx, input):
        ctx.save_for_backward(input)
        return input.clamp(min=0)

    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input

# Example usage
x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0], requires_grad=True)
my_relu = MyReLU.apply
y = my_relu(x)
print(y) # tensor([0., 0., 0., 1., 2.], grad_fn=<MyReLUBackward>)
y.sum().backward()
print(x.grad) # tensor([0., 0., 0., 1., 1.])

5.2 Distributed Training (`torch.nn.parallel.DistributedDataParallel`)

Using DistributedDataParallel (DDP) for multi-GPU/multi-node training requires setting up a proper distributed environment (e.g., using torch.distributed.launch).

import torch
import torch.nn as nn
import torch.optim as optim
import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel as DDP
import os

def setup(rank, world_size):
    os.environ['MASTER_ADDR'] = 'localhost'
    os.environ['MASTER_PORT'] = '12355' # Use an open port
    dist.init_process_group("gloo", rank=rank, world_size=world_size) # Use NCCL if available

def cleanup():
    dist.destroy_process_group()

class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 1)

    def forward(self, x):
        return self.linear(x)

def main(rank, world_size):
    setup(rank, world_size)

    # Create model
    model = SimpleModel().to(rank)  # Move to the correct device

    # Wrap the model with DDP
    ddp_model = DDP(model, device_ids=[rank])

    # Define loss function and optimizer
    criterion = nn.MSELoss()
    optimizer = optim.Adam(ddp_model.parameters(), lr=0.01)

    # Dummy data
    inputs = torch.randn(100, 10).to(rank)
    targets = torch.randn(100, 1).to(rank)

    # Training loop
    num_epochs = 5
    for epoch in range(num_epochs):
        optimizer.zero_grad()
        outputs = ddp_model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        if rank == 0:  # Print only on the master process
            print(f"Rank {rank}, Epoch {epoch+1}, Loss: {loss.item():.4f}")

    cleanup()

# Example usage (requires launching with torch.distributed.launch)
# torch.distributed.launch --nproc_per_node=2 your_script.py
if __name__ == '__main__':
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument('--local_rank', type=int, default=0) # Required by torch.distributed.launch
    args = parser.parse_args()

    rank = args.local_rank
    world_size = torch.cuda.device_count() # Use all available GPUs
    main(rank, world_size)

5.3 TorchScript (Model Serialization and Optimization)

import torch
import torch.nn as nn

class MyModule(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 5)

    def forward(self, x):
        return self.linear(x)

model = MyModule()

# Trace the model
example_input = torch.randn(1, 10)
traced_script_module = torch.jit.trace(model, example_input)

# Save the traced model
traced_script_module.save("traced_model.pt")

# Load the traced model
loaded_model = torch.jit.load("traced_model.pt")

# Use the loaded model
output = loaded_model(example_input)
print(output)

6. Tips & Tricks

Use torch.no_grad() for inference: This disables gradient calculation and reduces memory consumption.
model.eval() for evaluation: Sets the model to evaluation mode, disabling dropout and batch normalization.
model.train() for training: Sets the model to training mode, enabling dropout and batch normalization.
torch.backends.cudnn.benchmark = True: Enables cuDNN autotuner for potentially faster execution (use with fixed input sizes).
Gradient Clipping: torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm) prevents exploding gradients in RNNs.
Mixed Precision Training (FP16/AMP): Use torch.cuda.amp for faster training with reduced memory usage.
Parameter Initialization: Pay attention to parameter initialization, especially for deep networks (e.g., Kaiming initialization).

7. Integration

Pandas: Convert Pandas DataFrames to NumPy arrays and then to PyTorch Tensors.
Scikit-learn: Use PyTorch models as estimators in Scikit-learn pipelines.
Matplotlib: Visualize training curves, model outputs, and data distributions.
TensorBoard: Use torch.utils.tensorboard to log training metrics, model graphs, and images for visualization.
Hugging Face Transformers: Integrate pre-trained transformer models for NLP tasks.

import pandas as pd
import matplotlib.pyplot as plt
from torch.utils.tensorboard import SummaryWriter

# Pandas DataFrame to Tensor
df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
tensor = torch.from_numpy(df.values)

# Matplotlib Visualization
x = torch.linspace(0, 10, 100)
y = torch.sin(x)
plt.plot(x.numpy(), y.numpy())
plt.xlabel("x")
plt.ylabel("sin(x)")
plt.title("Sine Wave")
plt.show()

# TensorBoard Example
writer = SummaryWriter("runs/experiment_1") # Create a summary writer object
for n_iter in range(100):
    writer.add_scalar('Loss/train', np.random.random(), n_iter) #Add a scalar value
    writer.add_scalar('Accuracy/train', np.random.random(), n_iter)
    # add other metrics as needed
writer.close() # close the writer
# Then, in terminal, run "tensorboard --logdir=runs" and open the link in your browser

8. Further Resources

Official PyTorch Documentation: https://pytorch.org/docs/stable/index.html
PyTorch Tutorials: https://pytorch.org/tutorials/
PyTorch Examples: https://github.com/pytorch/examples
Fast.ai Deep Learning Course: https://www.fast.ai/
Hugging Face Transformers Documentation: https://huggingface.co/transformers/