Skip to content

56_Pytorch_For_Deep_Learning

Category: AI & Data Science Tools
Type: AI/ML Tool or Library
Generated on: 2025-08-26 11:09:32
For: Data Science, Machine Learning & Technical Interviews


PyTorch is an open-source machine learning framework based on the Torch library, primarily developed by Facebook’s AI Research lab (FAIR). It’s used for a wide range of tasks, including:

  • Deep Learning: Building and training neural networks for various tasks like image recognition, natural language processing, and time series forecasting.
  • Research: Rapid prototyping and experimentation with new deep learning architectures and algorithms.
  • Production Deployment: Serving trained models in real-world applications.

Key strengths include:

  • Dynamic Computation Graph: Offers flexibility for debugging and experimentation.
  • Pythonic Interface: Easy to learn and use for Python developers.
  • GPU Acceleration: Supports efficient training on GPUs.
  • Large Community & Ecosystem: Extensive support, libraries, and pre-trained models.
Terminal window
# Install with conda (recommended)
conda install pytorch torchvision torchaudio -c pytorch
# Install with pip (CPU only)
pip install torch torchvision torchaudio
# Install with pip (CUDA 11.6) - Replace version if needed
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu116

Verification:

import torch
print(torch.__version__)
# Check GPU availability
print(torch.cuda.is_available())
if torch.cuda.is_available():
device = torch.device("cuda")
print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
device = torch.device("cpu")
print("Using CPU")
# Expected Output (example):
# 2.0.1+cu118
# True
# Using GPU: NVIDIA GeForce RTX 3090
  • torch.Tensor: The fundamental data structure in PyTorch, similar to NumPy arrays.
import torch
# Creating tensors
x = torch.tensor([1, 2, 3])
y = torch.ones(5) # Tensor of ones
z = torch.zeros((2, 3)) # Tensor of zeros
r = torch.rand((3, 2)) # Tensor of random numbers (uniform distribution)
n = torch.randn((3, 2)) # Tensor of random numbers (standard normal distribution)
# Tensor attributes
print(x.dtype) # torch.int64 (default)
print(x.shape) # torch.Size([3])
print(x.device) # device(type='cpu')
# Tensor operations
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])
c = a + b
d = torch.dot(a, b) # Dot product
e = torch.matmul(r, n) # Matrix multiplication (if shapes compatible)
print(c) # tensor([5, 7, 9])
print(d) # tensor(32)
print(e.shape) # torch.Size([3, 2])
# Move tensors to GPU
if torch.cuda.is_available():
device = torch.device('cuda')
x = x.to(device) # Same as x = x.cuda()
print(x.device) # device(type='cuda', index=0)
  • Data Types: torch.float32 (default), torch.float64, torch.float16, torch.int64, torch.int32, torch.bool. Use .to(dtype) to cast.
  • NumPy Interoperability:
import numpy as np
numpy_array = np.array([1, 2, 3])
torch_tensor = torch.from_numpy(numpy_array) # Creates a tensor from a numpy array
back_to_numpy = torch_tensor.numpy() # Converts a tensor to a numpy array
# Changes to one will affect the other if they share the same memory
torch_tensor[0] = 10
print(numpy_array) # [10 2 3]
  • torch.autograd: The engine that powers automatic differentiation in PyTorch. It allows you to compute gradients of tensors with respect to other tensors.
x = torch.tensor(2.0, requires_grad=True) # Enable gradient tracking
y = x**2 + 2*x + 1
y.backward() # Calculate gradients
print(x.grad) # Gradient of y with respect to x (6.0)
# More complex example
a = torch.tensor([2.0, 3.0], requires_grad=True)
b = a * 2
c = b.mean() # Mean of b
c.backward() # Calculate gradients
print(a.grad) # tensor([1., 1.]) (d(c)/da = 1)
# Disable gradient tracking
with torch.no_grad():
z = x**2 + 2*x + 1 # No gradient calculation
# Detach a tensor from the computation graph
detached_tensor = y.detach() # Creates a new tensor that shares the same data
# but does not require gradients.
  • torch.nn.Module: Base class for all neural network modules.
  • torch.nn.Linear: Fully connected layer.
  • torch.nn.Conv2d: 2D convolutional layer.
  • torch.nn.MaxPool2d: 2D max pooling layer.
  • torch.nn.ReLU: Rectified Linear Unit activation function.
  • torch.nn.Sigmoid: Sigmoid activation function.
  • torch.nn.CrossEntropyLoss: Cross-entropy loss function.
import torch.nn as nn
import torch.nn.functional as F
class SimpleNN(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super(SimpleNN, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(hidden_size, num_classes)
def forward(self, x):
out = self.fc1(x)
out = self.relu(out)
out = self.fc2(out)
return out
# Instantiate the model
input_size = 28*28 # MNIST image size
hidden_size = 500
num_classes = 10
model = SimpleNN(input_size, hidden_size, num_classes)
# Move model to GPU (if available)
if torch.cuda.is_available():
model = model.to(device)
# Example usage (dummy input)
dummy_input = torch.randn(1, input_size).to(device) # Batch size 1
output = model(dummy_input)
print(output.shape) # torch.Size([1, 10])
  • torch.optim: Contains various optimization algorithms for training neural networks.
  • torch.optim.SGD: Stochastic Gradient Descent.
  • torch.optim.Adam: Adam optimizer (adaptive learning rate).
import torch.optim as optim
# Define optimizer
learning_rate = 0.001
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
# Loss function
criterion = nn.CrossEntropyLoss()
# Training loop (example)
num_epochs = 2
batch_size = 100
input_size = 28*28
num_classes = 10
# Dummy data
X_train = torch.randn(1000, input_size).to(device)
Y_train = torch.randint(0, num_classes, (1000,)).to(device)
for epoch in range(num_epochs):
for i in range(0, len(X_train), batch_size):
# Get batch
inputs = X_train[i:i+batch_size]
labels = Y_train[i:i+batch_size]
# Zero the parameter gradients
optimizer.zero_grad()
# Forward pass
outputs = model(inputs)
loss = criterion(outputs, labels)
# Backward pass and optimization
loss.backward()
optimizer.step()
print (f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

3.5 Dataset and DataLoader (torch.utils.data)

Section titled “3.5 Dataset and DataLoader (torch.utils.data)”
  • torch.utils.data.Dataset: Abstract class representing a dataset.
  • torch.utils.data.DataLoader: Provides an iterable over the dataset for batching and shuffling.
from torch.utils.data import Dataset, DataLoader
class CustomDataset(Dataset):
def __init__(self, data, labels):
self.data = data
self.labels = labels
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
return self.data[idx], self.labels[idx]
# Example usage
data = torch.randn(100, 10) # 100 samples, 10 features
labels = torch.randint(0, 2, (100,)) # Binary classification labels
dataset = CustomDataset(data, labels)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True) #Shuffle the dataset
# Iterate through the dataloader
for batch_idx, (inputs, targets) in enumerate(dataloader):
print(f"Batch {batch_idx}: Input shape = {inputs.shape}, Target shape = {targets.shape}")
# Save the model
torch.save(model.state_dict(), 'model.pth') # Saves only the model's parameters
# Load the model
loaded_model = SimpleNN(input_size, hidden_size, num_classes).to(device) # Create a new instance
loaded_model.load_state_dict(torch.load('model.pth')) # Load the parameters
loaded_model.eval() # Set the model to evaluation mode (important for inference)
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
# Define transformations
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,)) # Normalize to [-1, 1]
])
# Load MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
# Create data loaders
batch_size = 64
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
# Define the neural network
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, kernel_size=3)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
self.fc1 = nn.Linear(64 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(torch.relu(self.conv1(x)))
x = self.pool(torch.relu(self.conv2(x)))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)
# Training loop
num_epochs = 2
for epoch in range(num_epochs):
running_loss = 0.0
for i, data in enumerate(train_loader, 0):
inputs, labels = data[0].to(device), data[1].to(device)
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if i % 200 == 199: # print every 200 mini-batches
print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 200:.3f}')
running_loss = 0.0
print('Finished Training')
# Evaluate the model
correct = 0
total = 0
with torch.no_grad():
for data in test_loader:
images, labels = data[0].to(device), data[1].to(device)
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f'Accuracy of the network on the 10000 test images: {100 * correct / total:.2f}%')
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np
# Sample data (replace with your actual dataset)
sentences = [
"This movie is great!",
"I hated this movie.",
"The acting was terrible.",
"I loved the story.",
"It was a waste of time."
]
labels = [1, 0, 0, 1, 0] # 1: positive, 0: negative
# Tokenization and vocabulary creation
vocab = set()
for sentence in sentences:
for word in sentence.split():
vocab.add(word)
vocab = list(vocab)
word_to_index = {word: i for i, word in enumerate(vocab)}
index_to_word = {i: word for i, word in enumerate(vocab)}
# Convert sentences to sequences of indices
def sentence_to_indices(sentence):
return [word_to_index[word] for word in sentence.split()]
indexed_sentences = [sentence_to_indices(sentence) for sentence in sentences]
# Pad sequences to the same length
max_length = max(len(sentence) for sentence in indexed_sentences)
padded_sentences = [sentence + [0] * (max_length - len(sentence)) for sentence in indexed_sentences] # Padding with index 0
# Convert data to tensors
X = torch.tensor(padded_sentences)
Y = torch.tensor(labels)
# Define the dataset
class SentimentDataset(Dataset):
def __init__(self, data, labels):
self.data = data
self.labels = labels
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
return self.data[idx], self.labels[idx]
dataset = SentimentDataset(X, Y)
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)
# Define the RNN model
class RNN(nn.Module):
def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
super().__init__()
self.embedding = nn.Embedding(vocab_size, embedding_dim)
self.rnn = nn.RNN(embedding_dim, hidden_dim, batch_first=True)
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
embedded = self.embedding(x)
output, hidden = self.rnn(embedded)
return self.fc(output[:, -1, :]) # Take the last time step's output
# Instantiate the model
vocab_size = len(vocab)
embedding_dim = 50
hidden_dim = 100
output_dim = 1 # Binary classification
model = RNN(vocab_size, embedding_dim, hidden_dim, output_dim).to(device)
# Loss function and optimizer
criterion = nn.BCEWithLogitsLoss() # Binary cross-entropy with logits
optimizer = optim.Adam(model.parameters(), lr=0.01)
# Training loop
num_epochs = 10
for epoch in range(num_epochs):
for inputs, labels in dataloader:
inputs, labels = inputs.to(device), labels.float().unsqueeze(1).to(device) # Add a dimension to the labels
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
print(f'Epoch {epoch+1}, Loss: {loss.item():.4f}')
# Evaluation (on the training data for simplicity)
with torch.no_grad():
correct = 0
total = 0
for inputs, labels in dataloader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = torch.sigmoid(model(inputs)) # Apply sigmoid to get probabilities
predicted = (outputs > 0.5).float() # Threshold at 0.5
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f'Accuracy: {100 * correct / total:.2f}%')
import torch
from torch.autograd import Function
# Custom ReLU function with custom backward pass
class MyReLU(Function):
@staticmethod
def forward(ctx, input):
ctx.save_for_backward(input)
return input.clamp(min=0)
@staticmethod
def backward(ctx, grad_output):
input, = ctx.saved_tensors
grad_input = grad_output.clone()
grad_input[input < 0] = 0
return grad_input
# Example usage
x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0], requires_grad=True)
my_relu = MyReLU.apply
y = my_relu(x)
print(y) # tensor([0., 0., 0., 1., 2.], grad_fn=<MyReLUBackward>)
y.sum().backward()
print(x.grad) # tensor([0., 0., 0., 1., 1.])

5.2 Distributed Training (torch.nn.parallel.DistributedDataParallel)

Section titled “5.2 Distributed Training (torch.nn.parallel.DistributedDataParallel)”

Using DistributedDataParallel (DDP) for multi-GPU/multi-node training requires setting up a proper distributed environment (e.g., using torch.distributed.launch).

import torch
import torch.nn as nn
import torch.optim as optim
import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel as DDP
import os
def setup(rank, world_size):
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '12355' # Use an open port
dist.init_process_group("gloo", rank=rank, world_size=world_size) # Use NCCL if available
def cleanup():
dist.destroy_process_group()
class SimpleModel(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(10, 1)
def forward(self, x):
return self.linear(x)
def main(rank, world_size):
setup(rank, world_size)
# Create model
model = SimpleModel().to(rank) # Move to the correct device
# Wrap the model with DDP
ddp_model = DDP(model, device_ids=[rank])
# Define loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(ddp_model.parameters(), lr=0.01)
# Dummy data
inputs = torch.randn(100, 10).to(rank)
targets = torch.randn(100, 1).to(rank)
# Training loop
num_epochs = 5
for epoch in range(num_epochs):
optimizer.zero_grad()
outputs = ddp_model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()
if rank == 0: # Print only on the master process
print(f"Rank {rank}, Epoch {epoch+1}, Loss: {loss.item():.4f}")
cleanup()
# Example usage (requires launching with torch.distributed.launch)
# torch.distributed.launch --nproc_per_node=2 your_script.py
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--local_rank', type=int, default=0) # Required by torch.distributed.launch
args = parser.parse_args()
rank = args.local_rank
world_size = torch.cuda.device_count() # Use all available GPUs
main(rank, world_size)

5.3 TorchScript (Model Serialization and Optimization)

Section titled “5.3 TorchScript (Model Serialization and Optimization)”
import torch
import torch.nn as nn
class MyModule(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(10, 5)
def forward(self, x):
return self.linear(x)
model = MyModule()
# Trace the model
example_input = torch.randn(1, 10)
traced_script_module = torch.jit.trace(model, example_input)
# Save the traced model
traced_script_module.save("traced_model.pt")
# Load the traced model
loaded_model = torch.jit.load("traced_model.pt")
# Use the loaded model
output = loaded_model(example_input)
print(output)
  • Use torch.no_grad() for inference: This disables gradient calculation and reduces memory consumption.
  • model.eval() for evaluation: Sets the model to evaluation mode, disabling dropout and batch normalization.
  • model.train() for training: Sets the model to training mode, enabling dropout and batch normalization.
  • torch.backends.cudnn.benchmark = True: Enables cuDNN autotuner for potentially faster execution (use with fixed input sizes).
  • Gradient Clipping: torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm) prevents exploding gradients in RNNs.
  • Mixed Precision Training (FP16/AMP): Use torch.cuda.amp for faster training with reduced memory usage.
  • Parameter Initialization: Pay attention to parameter initialization, especially for deep networks (e.g., Kaiming initialization).
  • Pandas: Convert Pandas DataFrames to NumPy arrays and then to PyTorch Tensors.
  • Scikit-learn: Use PyTorch models as estimators in Scikit-learn pipelines.
  • Matplotlib: Visualize training curves, model outputs, and data distributions.
  • TensorBoard: Use torch.utils.tensorboard to log training metrics, model graphs, and images for visualization.
  • Hugging Face Transformers: Integrate pre-trained transformer models for NLP tasks.
import pandas as pd
import matplotlib.pyplot as plt
from torch.utils.tensorboard import SummaryWriter
# Pandas DataFrame to Tensor
df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
tensor = torch.from_numpy(df.values)
# Matplotlib Visualization
x = torch.linspace(0, 10, 100)
y = torch.sin(x)
plt.plot(x.numpy(), y.numpy())
plt.xlabel("x")
plt.ylabel("sin(x)")
plt.title("Sine Wave")
plt.show()
# TensorBoard Example
writer = SummaryWriter("runs/experiment_1") # Create a summary writer object
for n_iter in range(100):
writer.add_scalar('Loss/train', np.random.random(), n_iter) #Add a scalar value
writer.add_scalar('Accuracy/train', np.random.random(), n_iter)
# add other metrics as needed
writer.close() # close the writer
# Then, in terminal, run "tensorboard --logdir=runs" and open the link in your browser