56_Pytorch_For_Deep_Learning
Category: AI & Data Science Tools
Type: AI/ML Tool or Library
Generated on: 2025-08-26 11:09:32
For: Data Science, Machine Learning & Technical Interviews
PyTorch for Deep Learning Cheatsheet
Section titled “PyTorch for Deep Learning Cheatsheet”1. Tool/Library Overview
Section titled “1. Tool/Library Overview”PyTorch is an open-source machine learning framework based on the Torch library, primarily developed by Facebook’s AI Research lab (FAIR). It’s used for a wide range of tasks, including:
- Deep Learning: Building and training neural networks for various tasks like image recognition, natural language processing, and time series forecasting.
- Research: Rapid prototyping and experimentation with new deep learning architectures and algorithms.
- Production Deployment: Serving trained models in real-world applications.
Key strengths include:
- Dynamic Computation Graph: Offers flexibility for debugging and experimentation.
- Pythonic Interface: Easy to learn and use for Python developers.
- GPU Acceleration: Supports efficient training on GPUs.
- Large Community & Ecosystem: Extensive support, libraries, and pre-trained models.
2. Installation & Setup
Section titled “2. Installation & Setup”# Install with conda (recommended)conda install pytorch torchvision torchaudio -c pytorch
# Install with pip (CPU only)pip install torch torchvision torchaudio
# Install with pip (CUDA 11.6) - Replace version if neededpip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu116Verification:
import torchprint(torch.__version__)
# Check GPU availabilityprint(torch.cuda.is_available())
if torch.cuda.is_available(): device = torch.device("cuda") print(f"Using GPU: {torch.cuda.get_device_name(0)}")else: device = torch.device("cpu") print("Using CPU")
# Expected Output (example):# 2.0.1+cu118# True# Using GPU: NVIDIA GeForce RTX 30903. Core Features & API
Section titled “3. Core Features & API”3.1 Tensors
Section titled “3.1 Tensors”torch.Tensor: The fundamental data structure in PyTorch, similar to NumPy arrays.
import torch
# Creating tensorsx = torch.tensor([1, 2, 3])y = torch.ones(5) # Tensor of onesz = torch.zeros((2, 3)) # Tensor of zerosr = torch.rand((3, 2)) # Tensor of random numbers (uniform distribution)n = torch.randn((3, 2)) # Tensor of random numbers (standard normal distribution)
# Tensor attributesprint(x.dtype) # torch.int64 (default)print(x.shape) # torch.Size([3])print(x.device) # device(type='cpu')
# Tensor operationsa = torch.tensor([1, 2, 3])b = torch.tensor([4, 5, 6])c = a + bd = torch.dot(a, b) # Dot producte = torch.matmul(r, n) # Matrix multiplication (if shapes compatible)
print(c) # tensor([5, 7, 9])print(d) # tensor(32)print(e.shape) # torch.Size([3, 2])
# Move tensors to GPUif torch.cuda.is_available(): device = torch.device('cuda') x = x.to(device) # Same as x = x.cuda() print(x.device) # device(type='cuda', index=0)- Data Types:
torch.float32(default),torch.float64,torch.float16,torch.int64,torch.int32,torch.bool. Use.to(dtype)to cast. - NumPy Interoperability:
import numpy as npnumpy_array = np.array([1, 2, 3])torch_tensor = torch.from_numpy(numpy_array) # Creates a tensor from a numpy arrayback_to_numpy = torch_tensor.numpy() # Converts a tensor to a numpy array
# Changes to one will affect the other if they share the same memorytorch_tensor[0] = 10print(numpy_array) # [10 2 3]3.2 Autograd (Automatic Differentiation)
Section titled “3.2 Autograd (Automatic Differentiation)”torch.autograd: The engine that powers automatic differentiation in PyTorch. It allows you to compute gradients of tensors with respect to other tensors.
x = torch.tensor(2.0, requires_grad=True) # Enable gradient trackingy = x**2 + 2*x + 1y.backward() # Calculate gradientsprint(x.grad) # Gradient of y with respect to x (6.0)
# More complex examplea = torch.tensor([2.0, 3.0], requires_grad=True)b = a * 2c = b.mean() # Mean of bc.backward() # Calculate gradientsprint(a.grad) # tensor([1., 1.]) (d(c)/da = 1)
# Disable gradient trackingwith torch.no_grad(): z = x**2 + 2*x + 1 # No gradient calculation
# Detach a tensor from the computation graphdetached_tensor = y.detach() # Creates a new tensor that shares the same data # but does not require gradients.3.3 Neural Network Module (torch.nn)
Section titled “3.3 Neural Network Module (torch.nn)”torch.nn.Module: Base class for all neural network modules.torch.nn.Linear: Fully connected layer.torch.nn.Conv2d: 2D convolutional layer.torch.nn.MaxPool2d: 2D max pooling layer.torch.nn.ReLU: Rectified Linear Unit activation function.torch.nn.Sigmoid: Sigmoid activation function.torch.nn.CrossEntropyLoss: Cross-entropy loss function.
import torch.nn as nnimport torch.nn.functional as F
class SimpleNN(nn.Module): def __init__(self, input_size, hidden_size, num_classes): super(SimpleNN, self).__init__() self.fc1 = nn.Linear(input_size, hidden_size) self.relu = nn.ReLU() self.fc2 = nn.Linear(hidden_size, num_classes)
def forward(self, x): out = self.fc1(x) out = self.relu(out) out = self.fc2(out) return out
# Instantiate the modelinput_size = 28*28 # MNIST image sizehidden_size = 500num_classes = 10model = SimpleNN(input_size, hidden_size, num_classes)
# Move model to GPU (if available)if torch.cuda.is_available(): model = model.to(device)
# Example usage (dummy input)dummy_input = torch.randn(1, input_size).to(device) # Batch size 1output = model(dummy_input)print(output.shape) # torch.Size([1, 10])3.4 Optimization (torch.optim)
Section titled “3.4 Optimization (torch.optim)”torch.optim: Contains various optimization algorithms for training neural networks.torch.optim.SGD: Stochastic Gradient Descent.torch.optim.Adam: Adam optimizer (adaptive learning rate).
import torch.optim as optim
# Define optimizerlearning_rate = 0.001optimizer = optim.Adam(model.parameters(), lr=learning_rate)
# Loss functioncriterion = nn.CrossEntropyLoss()
# Training loop (example)num_epochs = 2batch_size = 100input_size = 28*28num_classes = 10
# Dummy dataX_train = torch.randn(1000, input_size).to(device)Y_train = torch.randint(0, num_classes, (1000,)).to(device)
for epoch in range(num_epochs): for i in range(0, len(X_train), batch_size): # Get batch inputs = X_train[i:i+batch_size] labels = Y_train[i:i+batch_size]
# Zero the parameter gradients optimizer.zero_grad()
# Forward pass outputs = model(inputs) loss = criterion(outputs, labels)
# Backward pass and optimization loss.backward() optimizer.step()
print (f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')3.5 Dataset and DataLoader (torch.utils.data)
Section titled “3.5 Dataset and DataLoader (torch.utils.data)”torch.utils.data.Dataset: Abstract class representing a dataset.torch.utils.data.DataLoader: Provides an iterable over the dataset for batching and shuffling.
from torch.utils.data import Dataset, DataLoader
class CustomDataset(Dataset): def __init__(self, data, labels): self.data = data self.labels = labels
def __len__(self): return len(self.data)
def __getitem__(self, idx): return self.data[idx], self.labels[idx]
# Example usagedata = torch.randn(100, 10) # 100 samples, 10 featureslabels = torch.randint(0, 2, (100,)) # Binary classification labels
dataset = CustomDataset(data, labels)dataloader = DataLoader(dataset, batch_size=32, shuffle=True) #Shuffle the dataset
# Iterate through the dataloaderfor batch_idx, (inputs, targets) in enumerate(dataloader): print(f"Batch {batch_idx}: Input shape = {inputs.shape}, Target shape = {targets.shape}")3.6 Saving and Loading Models
Section titled “3.6 Saving and Loading Models”# Save the modeltorch.save(model.state_dict(), 'model.pth') # Saves only the model's parameters
# Load the modelloaded_model = SimpleNN(input_size, hidden_size, num_classes).to(device) # Create a new instanceloaded_model.load_state_dict(torch.load('model.pth')) # Load the parametersloaded_model.eval() # Set the model to evaluation mode (important for inference)4. Practical Examples
Section titled “4. Practical Examples”4.1 Image Classification with MNIST
Section titled “4.1 Image Classification with MNIST”import torchimport torch.nn as nnimport torch.optim as optimimport torchvisionimport torchvision.transforms as transformsfrom torch.utils.data import DataLoader
# Define transformationstransform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) # Normalize to [-1, 1]])
# Load MNIST datasettrain_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)test_dataset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
# Create data loadersbatch_size = 64train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
# Define the neural networkclass Net(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 32, kernel_size=3) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(32, 64, kernel_size=3) self.fc1 = nn.Linear(64 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10)
def forward(self, x): x = self.pool(torch.relu(self.conv1(x))) x = self.pool(torch.relu(self.conv2(x))) x = torch.flatten(x, 1) # flatten all dimensions except batch x = torch.relu(self.fc1(x)) x = torch.relu(self.fc2(x)) x = self.fc3(x) return x
net = Net().to(device)criterion = nn.CrossEntropyLoss()optimizer = optim.Adam(net.parameters(), lr=0.001)
# Training loopnum_epochs = 2for epoch in range(num_epochs): running_loss = 0.0 for i, data in enumerate(train_loader, 0): inputs, labels = data[0].to(device), data[1].to(device)
optimizer.zero_grad()
outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step()
running_loss += loss.item() if i % 200 == 199: # print every 200 mini-batches print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 200:.3f}') running_loss = 0.0
print('Finished Training')
# Evaluate the modelcorrect = 0total = 0with torch.no_grad(): for data in test_loader: images, labels = data[0].to(device), data[1].to(device) outputs = net(images) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item()
print(f'Accuracy of the network on the 10000 test images: {100 * correct / total:.2f}%')4.2 Sentiment Analysis with RNN
Section titled “4.2 Sentiment Analysis with RNN”import torchimport torch.nn as nnimport torch.optim as optimfrom torch.utils.data import Dataset, DataLoaderimport numpy as np
# Sample data (replace with your actual dataset)sentences = [ "This movie is great!", "I hated this movie.", "The acting was terrible.", "I loved the story.", "It was a waste of time."]labels = [1, 0, 0, 1, 0] # 1: positive, 0: negative
# Tokenization and vocabulary creationvocab = set()for sentence in sentences: for word in sentence.split(): vocab.add(word)vocab = list(vocab)word_to_index = {word: i for i, word in enumerate(vocab)}index_to_word = {i: word for i, word in enumerate(vocab)}
# Convert sentences to sequences of indicesdef sentence_to_indices(sentence): return [word_to_index[word] for word in sentence.split()]
indexed_sentences = [sentence_to_indices(sentence) for sentence in sentences]
# Pad sequences to the same lengthmax_length = max(len(sentence) for sentence in indexed_sentences)padded_sentences = [sentence + [0] * (max_length - len(sentence)) for sentence in indexed_sentences] # Padding with index 0
# Convert data to tensorsX = torch.tensor(padded_sentences)Y = torch.tensor(labels)
# Define the datasetclass SentimentDataset(Dataset): def __init__(self, data, labels): self.data = data self.labels = labels
def __len__(self): return len(self.data)
def __getitem__(self, idx): return self.data[idx], self.labels[idx]
dataset = SentimentDataset(X, Y)dataloader = DataLoader(dataset, batch_size=2, shuffle=True)
# Define the RNN modelclass RNN(nn.Module): def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim): super().__init__() self.embedding = nn.Embedding(vocab_size, embedding_dim) self.rnn = nn.RNN(embedding_dim, hidden_dim, batch_first=True) self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x): embedded = self.embedding(x) output, hidden = self.rnn(embedded) return self.fc(output[:, -1, :]) # Take the last time step's output
# Instantiate the modelvocab_size = len(vocab)embedding_dim = 50hidden_dim = 100output_dim = 1 # Binary classificationmodel = RNN(vocab_size, embedding_dim, hidden_dim, output_dim).to(device)
# Loss function and optimizercriterion = nn.BCEWithLogitsLoss() # Binary cross-entropy with logitsoptimizer = optim.Adam(model.parameters(), lr=0.01)
# Training loopnum_epochs = 10for epoch in range(num_epochs): for inputs, labels in dataloader: inputs, labels = inputs.to(device), labels.float().unsqueeze(1).to(device) # Add a dimension to the labels
optimizer.zero_grad()
outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step()
print(f'Epoch {epoch+1}, Loss: {loss.item():.4f}')
# Evaluation (on the training data for simplicity)with torch.no_grad(): correct = 0 total = 0 for inputs, labels in dataloader: inputs, labels = inputs.to(device), labels.to(device) outputs = torch.sigmoid(model(inputs)) # Apply sigmoid to get probabilities predicted = (outputs > 0.5).float() # Threshold at 0.5 total += labels.size(0) correct += (predicted == labels).sum().item()
print(f'Accuracy: {100 * correct / total:.2f}%')5. Advanced Usage
Section titled “5. Advanced Usage”5.1 Custom Layers and Autograd Functions
Section titled “5.1 Custom Layers and Autograd Functions”import torchfrom torch.autograd import Function
# Custom ReLU function with custom backward passclass MyReLU(Function): @staticmethod def forward(ctx, input): ctx.save_for_backward(input) return input.clamp(min=0)
@staticmethod def backward(ctx, grad_output): input, = ctx.saved_tensors grad_input = grad_output.clone() grad_input[input < 0] = 0 return grad_input
# Example usagex = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0], requires_grad=True)my_relu = MyReLU.applyy = my_relu(x)print(y) # tensor([0., 0., 0., 1., 2.], grad_fn=<MyReLUBackward>)y.sum().backward()print(x.grad) # tensor([0., 0., 0., 1., 1.])5.2 Distributed Training (torch.nn.parallel.DistributedDataParallel)
Section titled “5.2 Distributed Training (torch.nn.parallel.DistributedDataParallel)”Using DistributedDataParallel (DDP) for multi-GPU/multi-node training requires setting up a proper distributed environment (e.g., using torch.distributed.launch).
import torchimport torch.nn as nnimport torch.optim as optimimport torch.distributed as distfrom torch.nn.parallel import DistributedDataParallel as DDPimport os
def setup(rank, world_size): os.environ['MASTER_ADDR'] = 'localhost' os.environ['MASTER_PORT'] = '12355' # Use an open port dist.init_process_group("gloo", rank=rank, world_size=world_size) # Use NCCL if available
def cleanup(): dist.destroy_process_group()
class SimpleModel(nn.Module): def __init__(self): super().__init__() self.linear = nn.Linear(10, 1)
def forward(self, x): return self.linear(x)
def main(rank, world_size): setup(rank, world_size)
# Create model model = SimpleModel().to(rank) # Move to the correct device
# Wrap the model with DDP ddp_model = DDP(model, device_ids=[rank])
# Define loss function and optimizer criterion = nn.MSELoss() optimizer = optim.Adam(ddp_model.parameters(), lr=0.01)
# Dummy data inputs = torch.randn(100, 10).to(rank) targets = torch.randn(100, 1).to(rank)
# Training loop num_epochs = 5 for epoch in range(num_epochs): optimizer.zero_grad() outputs = ddp_model(inputs) loss = criterion(outputs, targets) loss.backward() optimizer.step()
if rank == 0: # Print only on the master process print(f"Rank {rank}, Epoch {epoch+1}, Loss: {loss.item():.4f}")
cleanup()
# Example usage (requires launching with torch.distributed.launch)# torch.distributed.launch --nproc_per_node=2 your_script.pyif __name__ == '__main__': import argparse parser = argparse.ArgumentParser() parser.add_argument('--local_rank', type=int, default=0) # Required by torch.distributed.launch args = parser.parse_args()
rank = args.local_rank world_size = torch.cuda.device_count() # Use all available GPUs main(rank, world_size)5.3 TorchScript (Model Serialization and Optimization)
Section titled “5.3 TorchScript (Model Serialization and Optimization)”import torchimport torch.nn as nn
class MyModule(nn.Module): def __init__(self): super().__init__() self.linear = nn.Linear(10, 5)
def forward(self, x): return self.linear(x)
model = MyModule()
# Trace the modelexample_input = torch.randn(1, 10)traced_script_module = torch.jit.trace(model, example_input)
# Save the traced modeltraced_script_module.save("traced_model.pt")
# Load the traced modelloaded_model = torch.jit.load("traced_model.pt")
# Use the loaded modeloutput = loaded_model(example_input)print(output)6. Tips & Tricks
Section titled “6. Tips & Tricks”- Use
torch.no_grad()for inference: This disables gradient calculation and reduces memory consumption. model.eval()for evaluation: Sets the model to evaluation mode, disabling dropout and batch normalization.model.train()for training: Sets the model to training mode, enabling dropout and batch normalization.torch.backends.cudnn.benchmark = True: Enables cuDNN autotuner for potentially faster execution (use with fixed input sizes).- Gradient Clipping:
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)prevents exploding gradients in RNNs. - Mixed Precision Training (FP16/AMP): Use
torch.cuda.ampfor faster training with reduced memory usage. - Parameter Initialization: Pay attention to parameter initialization, especially for deep networks (e.g., Kaiming initialization).
7. Integration
Section titled “7. Integration”- Pandas: Convert Pandas DataFrames to NumPy arrays and then to PyTorch Tensors.
- Scikit-learn: Use PyTorch models as estimators in Scikit-learn pipelines.
- Matplotlib: Visualize training curves, model outputs, and data distributions.
- TensorBoard: Use
torch.utils.tensorboardto log training metrics, model graphs, and images for visualization. - Hugging Face Transformers: Integrate pre-trained transformer models for NLP tasks.
import pandas as pdimport matplotlib.pyplot as pltfrom torch.utils.tensorboard import SummaryWriter
# Pandas DataFrame to Tensordf = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})tensor = torch.from_numpy(df.values)
# Matplotlib Visualizationx = torch.linspace(0, 10, 100)y = torch.sin(x)plt.plot(x.numpy(), y.numpy())plt.xlabel("x")plt.ylabel("sin(x)")plt.title("Sine Wave")plt.show()
# TensorBoard Examplewriter = SummaryWriter("runs/experiment_1") # Create a summary writer objectfor n_iter in range(100): writer.add_scalar('Loss/train', np.random.random(), n_iter) #Add a scalar value writer.add_scalar('Accuracy/train', np.random.random(), n_iter) # add other metrics as neededwriter.close() # close the writer# Then, in terminal, run "tensorboard --logdir=runs" and open the link in your browser8. Further Resources
Section titled “8. Further Resources”- Official PyTorch Documentation: https://pytorch.org/docs/stable/index.html
- PyTorch Tutorials: https://pytorch.org/tutorials/
- PyTorch Examples: https://github.com/pytorch/examples
- Fast.ai Deep Learning Course: https://www.fast.ai/
- Hugging Face Transformers Documentation: https://huggingface.co/transformers/