What is PyTorch?

PyTorch is an open-source machine learning framework developed by Facebook's AI Research team. It's known for its ease of use and dynamic computation graph.

# Import PyTorch
import torch

# Check PyTorch version
print(torch.__version__)

PyTorch vs TensorFlow

PyTorch is preferred for research due to dynamic computation graphs; TensorFlow is often used in production with static graphs.

# Example dynamic graph in PyTorch
x = torch.tensor(2.0, requires_grad=True)
y = x**2 + 3*x + 4
y.backward()  # auto-calculates gradient
print(x.grad)  # prints gradient of y w.r.t x

Installing PyTorch (CPU/GPU)

Install via pip or conda for CPU or CUDA GPU support.

# For CPU
pip install torch torchvision torchaudio

# For GPU (CUDA 11.8)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

PyTorch Ecosystem Overview

PyTorch includes TorchVision (images), TorchText (text), TorchAudio (sound), and PyTorch Lightning (training).

# Example import of TorchVision
import torchvision

# Load pretrained model
model = torchvision.models.resnet18(pretrained=True)

TensorFlow Compatibility via Torch-TensorRT

Torch-TensorRT optimizes PyTorch models to run on NVIDIA hardware, aiding TensorFlow-like deployment.

# Requires NVIDIA TensorRT setup
# Converts PyTorch model to optimized format
import torch_tensorrt
# Example conversion (commented)
# trt_model = torch_tensorrt.compile(model, inputs=[...])

Overview of Tensors

Tensors are multi-dimensional arrays (like NumPy arrays) and the core data structure in PyTorch.

# Create a tensor
tensor = torch.tensor([[1, 2], [3, 4]])
print(tensor)

PyTorch’s Dynamic Computational Graph

PyTorch builds the graph as operations are run, making debugging and flexibility easier.

x = torch.tensor(1.0, requires_grad=True)
y = x * 3
z = y ** 2
z.backward()
print(x.grad)  # prints dz/dx

PyTorch Use Cases

Used for deep learning, reinforcement learning, computer vision, NLP, and production models.

# Example: Sentiment classification using torchtext (not shown here for brevity)
# PyTorch is also used for StyleGAN, DALL·E, and other AI research

PyTorch Versioning and Updates

Versions are updated frequently. Major versions may change APIs. Use `torch.__version__` to check.

print("PyTorch version:", torch.__version__)

Basic PyTorch Workflow

Involves: preparing data, building models, defining loss/optimizer, training, and evaluating.

# Step 1: Define tensor inputs
x = torch.tensor([[1.0], [2.0]])
y = torch.tensor([[2.0], [4.0]])

# Step 2: Define a simple linear model
model = torch.nn.Linear(1, 1)

# Step 3: Define loss and optimizer
loss_fn = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Step 4: Train loop
for epoch in range(10):
    pred = model(x)
    loss = loss_fn(pred, y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

CPU vs GPU Execution

PyTorch supports GPU acceleration using CUDA.

# Move tensor to GPU (if available)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = torch.tensor([1.0]).to(device)
print(x.device)

Supported IDEs and Environments

Supported by Jupyter Notebook, PyCharm, VSCode, Google Colab, and Kaggle Kernels.

# Google Colab has GPU support pre-installed
# Use `!nvidia-smi` in Colab to verify GPU

Overview of PyTorch Community

Active community on GitHub, forums, PyTorch Lightning, HuggingFace, and conferences like NeurIPS.

Introduction to TorchVision, TorchText, TorchAudio

These libraries provide datasets, models, and tools for images, text, and audio.

from torchvision import datasets
from torchtext.datasets import IMDB
from torchaudio.datasets import SPEECHCOMMANDS

First PyTorch Program

A simple linear model to predict y = 2x.

import torch

# Data
x = torch.tensor([[1.0], [2.0]])
y = torch.tensor([[2.0], [4.0]])

# Model
model = torch.nn.Linear(1, 1)

# Loss & Optimizer
loss_fn = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Training loop
for epoch in range(20):
    y_pred = model(x)
    loss = loss_fn(y_pred, y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    print(f"Epoch {epoch+1}, Loss: {loss.item()}")

Creating Tensors

You can create tensors from lists, NumPy arrays, or use built-in methods like zeros, ones, rand.

x = torch.tensor([[1, 2], [3, 4]])
zeros = torch.zeros(2, 2)
ones = torch.ones(2, 2)
rand = torch.rand(2, 2)

Tensor Shapes and Ranks

Rank is the number of dimensions. Shape shows the size in each dimension.

t = torch.rand(3, 4, 5)
print(t.shape)  # (3, 4, 5)
print(t.ndim)   # 3

Indexing and Slicing Tensors

Access elements or sub-tensors using standard Python slicing.

t = torch.tensor([[1, 2], [3, 4]])
print(t[0])      # First row
print(t[:, 1])   # Second column

Tensor Operations (add, mul, dot)

Tensors support arithmetic operations and dot products.

a = torch.tensor([1, 2])
b = torch.tensor([3, 4])
print(torch.add(a, b))
print(torch.mul(a, b))
print(torch.dot(a, b))

Broadcasting in PyTorch

Allows arithmetic on tensors with different shapes by expanding dimensions automatically.

a = torch.tensor([[1], [2]])
b = torch.tensor([3, 4])
print(a + b)  # Broadcasts b to match shape of a

Tensor Reshaping: view(), reshape()

Change the shape of tensors without changing data.

x = torch.arange(6)
print(x.reshape(2, 3))
print(x.view(2, 3))

Transposing and Permuting

Swap axes or rotate dimensions.

x = torch.rand(2, 3)
print(x.T)  # Transpose for 2D
y = torch.rand(2, 3, 4)
print(y.permute(2, 0, 1))  # Custom dim reordering

Tensor Type Conversions

Convert between float, int, long, etc.

x = torch.tensor([1.0, 2.0])
print(x.int())  # convert to int

Interoperability with NumPy

Tensors can be converted to/from NumPy arrays.

import numpy as np
a = np.array([1, 2])
t = torch.from_numpy(a)
n = t.numpy()

Cloning and Detaching Tensors

Clone creates a copy. Detach removes from computation graph.

a = torch.tensor([1.0], requires_grad=True)
b = a * 2
c = b.detach().clone()

Tensor Initialization Techniques

Use torch.nn.init for custom weight init.

import torch.nn.init as init
w = torch.empty(3, 5)
init.xavier_uniform_(w)

In-place vs Out-of-place Operations

In-place ops modify tensor (`x.add_()`); out-of-place creates new one (`x + y`).

x = torch.tensor([1.0])
x.add_(1.0)  # In-place
x = x + 1.0  # Out-of-place

GPU Tensors: .cuda() and .to()

Move tensors to GPU using `.cuda()` or `.to("cuda")`.

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = torch.tensor([1.0]).to(device)

Saving and Loading Tensors

Save using `torch.save()`, load using `torch.load()`.

torch.save(x, "tensor.pt")
x_loaded = torch.load("tensor.pt")

Tensor Best Practices

Use `.detach()` to break graph, use `.to()` instead of `.cuda()` for flexibility, prefer `reshape()` over `view()` for dynamic tensors.

# Good habit: use `.to(device)`
# Use `detach()` before saving intermediate values

What is Autograd?

Autograd is PyTorch’s automatic differentiation engine that dynamically builds a computational graph and computes gradients during backpropagation.

import torch

x = torch.tensor(2.0, requires_grad=True)
y = x ** 2
y.backward()
print(x.grad)  # ➜ Prints: tensor(4.)

autograd.Variable vs Tensor

Previously, `Variable` was used to wrap tensors, but now Tensors themselves support autograd through `requires_grad`.

# Old way (deprecated):
# from torch.autograd import Variable
# x = Variable(torch.tensor(1.0), requires_grad=True)

# Modern way:
x = torch.tensor(1.0, requires_grad=True)

Understanding requires_grad

This flag tells PyTorch to track operations on a tensor so gradients can be calculated.

x = torch.tensor(3.0, requires_grad=True)
print(x.requires_grad)  # ➜ True

Backward Propagation with .backward()

Calling `.backward()` on a scalar result triggers gradient computation.

x = torch.tensor(5.0, requires_grad=True)
y = x * 2
y.backward()
print(x.grad)  # ➜ tensor(2.)

Computing Gradients

Gradients are stored in the `.grad` attribute of leaf tensors.

w = torch.tensor(4.0, requires_grad=True)
loss = w ** 3
loss.backward()
print(w.grad)  # ➜ tensor(48.)

Gradient Accumulation and Zeroing

By default, gradients accumulate. Call `.zero_()` to reset before new backward passes.

w.grad.zero_()  # Important in training loops to prevent gradient buildup

Detaching from the Graph

Use `.detach()` to stop tracking a tensor in the computational graph.

y = x.detach()  # y won’t require gradients even if x does

Jacobian and Hessian in PyTorch

Higher-dimensional derivatives like Jacobians and Hessians can be computed using torch.autograd.functional.

from torch.autograd.functional import jacobian

def func(x):
    return x ** 2 + 3 * x

jac = jacobian(func, torch.tensor(2.0))
print(jac)  # ➜ tensor(7.)

Higher-Order Derivatives

Set `create_graph=True` to compute higher-order gradients.

x = torch.tensor(1.0, requires_grad=True)
y = x ** 3
dy_dx = torch.autograd.grad(y, x, create_graph=True)[0]
dy2_dx2 = torch.autograd.grad(dy_dx, x)[0]
print(dy2_dx2)  # ➜ tensor(6.)

Custom Gradient Functions

Define custom forward/backward logic with `torch.autograd.Function`.

class Square(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
        ctx.save_for_backward(input)
        return input ** 2

    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        return 2 * input * grad_output

x = torch.tensor(3.0, requires_grad=True)
y = Square.apply(x)
y.backward()
print(x.grad)  # ➜ tensor(6.)

Gradient Clipping

Used to prevent exploding gradients by capping them at a maximum value.

torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

Autograd in Model Training

Autograd tracks all operations for training and allows optimizer steps using gradients.

loss.backward()   # compute gradients
optimizer.step()  # update parameters
optimizer.zero_grad()  # reset for next step

Debugging Gradient Issues

Use `.grad`, `.retain_graph=True`, and hooks to inspect problems during gradient flow.

for name, param in model.named_parameters():
    print(name, param.grad)  # Check if any grad is None

Computational Graph Visualization

Use tools like `torchviz` to visualize computational graphs for complex models.

from torchviz import make_dot
x = torch.tensor(2.0, requires_grad=True)
y = x ** 2
make_dot(y).render("graph", format="png")

Best Practices with Autograd

Always manage `.grad`, use `.detach()` for inference, and validate gradients to ensure correct flow.

# Tip:
# Use .no_grad() during evaluation to disable autograd:
with torch.no_grad():
    output = model(input)

Introduction to torch.nn

`torch.nn` is a module for building neural network layers, models, and loss functions.

import torch.nn as nn

model = nn.Linear(10, 1)  # A single-layer network

Using nn.Module

All custom models should subclass `nn.Module` and define layers in `__init__`, forward logic in `forward()`.

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(10, 1)

    def forward(self, x):
        return self.fc(x)

Sequential API

For simple models, `nn.Sequential` stacks layers without needing a full class.

model = nn.Sequential(
    nn.Linear(10, 20),
    nn.ReLU(),
    nn.Linear(20, 1)
)

Custom nn.Module Classes

Create flexible models by extending `nn.Module` and overriding `forward` behavior.

class CustomNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(4, 4)
        self.out = nn.Linear(4, 1)

    def forward(self, x):
        x = torch.relu(self.hidden(x))
        return self.out(x)

Layers Overview (Linear, Conv2d, RNN, etc.)

PyTorch offers many built-in layers including linear, convolutional, recurrent, and more.

nn.Linear(in_features, out_features)
nn.Conv2d(in_channels, out_channels, kernel_size)
nn.LSTM(input_size, hidden_size)

Activation Functions (ReLU, Sigmoid, etc.)

Activations are layers too. Common ones include ReLU, Sigmoid, and Tanh.

x = torch.relu(x)
x = torch.sigmoid(x)
x = torch.tanh(x)

Building Feedforward Networks

Fully connected (dense) networks are built using stacked `nn.Linear` and activations.

model = nn.Sequential(
    nn.Linear(10, 64),
    nn.ReLU(),
    nn.Linear(64, 1)
)

Forward Method Explained

`forward()` defines how data flows through your layers and is automatically used when calling `model(input)`.

def forward(self, x):
    x = self.layer1(x)
    x = self.activation(x)
    return self.layer2(x)

Weight Initialization Techniques

Manually initialize weights for control over training behavior.

def init_weights(m):
    if isinstance(m, nn.Linear):
        torch.nn.init.xavier_uniform_(m.weight)

model.apply(init_weights)

Saving and Loading Models

Use `state_dict()` to save, and `load_state_dict()` to restore models.

torch.save(model.state_dict(), "model.pth")
model.load_state_dict(torch.load("model.pth"))
model.eval()

Model Summary and Visualization

Use `torchinfo.summary()` or print model for insights.

from torchinfo import summary
summary(model, input_size=(1, 10))

Model Parameter Access

Access weights and biases directly for inspection or adjustment.

for name, param in model.named_parameters():
    print(name, param.shape)

Model Composition and Nesting

Submodules can be grouped inside a parent model for structure.

self.subnet = nn.Sequential(
    nn.Linear(5, 10),
    nn.ReLU()
)

Freezing Layers

Freeze layers during training for transfer learning or fine-tuning.

for param in model.layer.parameters():
    param.requires_grad = False

Transfer Learning Basics

Use pre-trained models and fine-tune the final layers for new tasks.

from torchvision import models
model = models.resnet18(pretrained=True)

# Freeze all layers
for param in model.parameters():
    param.requires_grad = False

# Replace final layer
model.fc = nn.Linear(512, 10)

What is Optimization?

Optimization adjusts model parameters to minimize loss using gradients computed via backpropagation.

# Simplified training loop with optimization
loss.backward()        # Compute gradients
optimizer.step()       # Update parameters
optimizer.zero_grad()  # Clear old gradients

Introduction to torch.optim

PyTorch’s `torch.optim` module provides common optimization algorithms for training neural networks.

import torch.optim as optim
optimizer = optim.SGD(model.parameters(), lr=0.01)

SGD Optimizer

Stochastic Gradient Descent is a basic and widely-used optimizer.

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

Adam Optimizer

Adam combines momentum and adaptive learning rates, often yielding faster convergence.

optimizer = optim.Adam(model.parameters(), lr=0.001)

Momentum and Nesterov

Momentum helps accelerate SGD in the right direction, Nesterov is a variant with lookahead.

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9, nesterov=True)

Learning Rate Scheduling

Schedulers adjust the learning rate over time to improve training stability and convergence.

from torch.optim.lr_scheduler import StepLR
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)

Gradient Descent Concepts

Gradient descent updates weights in the negative direction of the loss gradient.

# weight = weight - learning_rate * gradient

Custom Optimizers

You can define your own optimizer by subclassing `torch.optim.Optimizer`.

class CustomOptimizer(torch.optim.Optimizer):
    def __init__(self, params, lr=0.01):
        super().__init__(params, defaults={'lr': lr})
    def step(self):
        # Custom update logic
        pass

Weight Decay and Regularization

Weight decay (L2 regularization) penalizes large weights to prevent overfitting.

optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)

Gradient Clipping in Optimizers

Gradient clipping prevents exploding gradients in unstable models (e.g., RNNs).

torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

Optimizer Parameters

You can access and modify optimizer parameters dynamically during training.

for param_group in optimizer.param_groups:
    param_group['lr'] = 0.001  # Update learning rate

Comparing Optimizers

Try different optimizers on the same task to find the most effective one.

optimizers = [optim.SGD, optim.Adam, optim.RMSprop]
# Measure loss over epochs for each

Saving/Loading Optimizer State

To resume training, save and load the optimizer’s state_dict.

torch.save(optimizer.state_dict(), "opt.pth")
optimizer.load_state_dict(torch.load("opt.pth"))

Scheduler Integration

Combine optimizers and schedulers to dynamically adjust learning rates.

scheduler.step()  # Usually after each epoch

Debugging Optimization

Print gradients, learning rates, and losses to monitor training progress.

for name, param in model.named_parameters():
    print(name, param.grad.norm())  # Print gradient norm

What is a Loss Function?

A loss function measures how far a model's output is from the target. Lower loss = better performance.

loss = loss_fn(predicted_output, true_output)

Built-in Losses in PyTorch

PyTorch provides many built-in loss classes in `torch.nn`.

import torch.nn as nn
loss_fn = nn.MSELoss()

MSELoss and L1Loss

Common for regression: MSE penalizes squared error, L1 penalizes absolute error.

mse = nn.MSELoss()
l1 = nn.L1Loss()

CrossEntropyLoss

Used for classification; combines `LogSoftmax` and `NLLLoss`.

loss_fn = nn.CrossEntropyLoss()
output = model(input)  # raw logits
loss = loss_fn(output, target)

BCEWithLogitsLoss

For binary classification with raw logits (sigmoid + binary cross-entropy).

loss_fn = nn.BCEWithLogitsLoss()
loss = loss_fn(model(x), target)

KL Divergence Loss

Kullback–Leibler divergence measures difference between probability distributions.

kl = nn.KLDivLoss(reduction="batchmean")

Margin Ranking and Triplet Loss

Useful for learning to rank or distance-based training (e.g., embeddings).

triplet = nn.TripletMarginLoss(margin=1.0)
# anchor, positive, negative = ...
loss = triplet(anchor, positive, negative)

Contrastive Loss

Encourages similar samples to be closer and dissimilar farther apart.

# Often custom or from libraries like `torchcontrib` or `pytorch-metric-learning`

Huber Loss

Balances sensitivity and robustness; combines MSE and MAE behavior.

loss_fn = nn.SmoothL1Loss()  # Huber Loss

Custom Loss Functions

You can create custom losses by subclassing `nn.Module` or writing simple functions.

class CustomLoss(nn.Module):
    def forward(self, input, target):
        return torch.mean((input - target)**2)

Weighted Losses

Give more importance to certain classes or samples using `weight` argument.

loss_fn = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 2.0]))

Combining Multiple Losses

You can combine losses with weights for multi-task learning.

loss = 0.7 * loss1 + 0.3 * loss2

Regression vs Classification Losses

Use MSE/L1 for regression, CrossEntropy/BCE for classification tasks.

# Regression
loss = nn.MSELoss()

# Classification
loss = nn.CrossEntropyLoss()

Visualizing Loss Curves

Plot loss values during training to monitor convergence or detect overfitting.

import matplotlib.pyplot as plt
plt.plot(train_losses)
plt.plot(val_losses)
plt.legend(['Train', 'Validation'])
plt.show()

Choosing the Right Loss

Select a loss based on task type (regression, classification, ranking, etc.) and data distribution.

# Binary Classification → BCEWithLogitsLoss
# Multiclass → CrossEntropyLoss
# Regression → MSELoss / SmoothL1Loss

torch.utils.data.Dataset Overview

This is the base class for all datasets in PyTorch. You subclass it and override `__len__()` and `__getitem__()`.

from torch.utils.data import Dataset

class MyDataset(Dataset):
    def __init__(self, data):
        self.data = data

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]

Creating Custom Datasets

Custom datasets allow you to load your own data formats, apply logic, and return samples and labels.

class CustomImageDataset(Dataset):
    def __init__(self, image_paths, labels):
        self.image_paths = image_paths
        self.labels = labels

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        image = read_image(self.image_paths[idx])  # custom function
        return image, self.labels[idx]

Using torchvision.datasets

Prebuilt datasets like MNIST, CIFAR-10 are easily accessible using torchvision.

from torchvision.datasets import MNIST

dataset = MNIST(root='data', train=True, download=True)

DataLoader Basics

The DataLoader wraps a Dataset and provides mini-batches, shuffling, and multiprocessing.

from torch.utils.data import DataLoader

loader = DataLoader(dataset, batch_size=32, shuffle=True)

Batching and Shuffling

Use `batch_size` to control memory, and `shuffle=True` to ensure training randomness.

DataLoader(dataset, batch_size=64, shuffle=True)

Working with Collate Functions

Custom collate functions allow complex batching logic, such as padding variable-length sequences.

def my_collate(batch):
    return torch.stack([item[0] for item in batch]), [item[1] for item in batch]

DataLoader(dataset, batch_size=32, collate_fn=my_collate)

Transforms and Preprocessing

Use torchvision.transforms to preprocess images — resize, convert to tensor, etc.

from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor()
])

Normalization Techniques

Normalizing inputs helps stabilize training. Typically done to mean=0, std=1 per channel.

transforms.Normalize(mean=[0.5], std=[0.5])  # for grayscale

Data Augmentation with torchvision.transforms

Augmentation improves generalization by randomly transforming training samples.

transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(15),
    transforms.ToTensor()
])

Loading Text Data with torchtext

torchtext helps with NLP datasets and tokenization.

from torchtext.datasets import IMDB

train_iter = IMDB(split='train')  # yields (label, text) pairs

Audio Data with torchaudio

torchaudio makes it easy to load and preprocess audio data.

import torchaudio

waveform, sample_rate = torchaudio.load("file.wav")

Handling Large Datasets

Use lazy loading, memory mapping, or streaming for massive datasets to reduce memory load.

# Only load one sample at a time in __getitem__ to save memory

Multi-threaded Data Loading

Set `num_workers` > 0 to load data in parallel, improving performance.

DataLoader(dataset, batch_size=32, num_workers=4)

Dataset Split and Stratification

Split your dataset while keeping class distribution intact (e.g., using `train_test_split`).

from sklearn.model_selection import train_test_split

X_train, X_val = train_test_split(data, stratify=labels)

Best Practices for Data Pipelines

Normalize inputs, augment only training data, shuffle during training, keep loaders fast and deterministic.

# Shuffle: True for train, False for val/test
# Normalize and augment in transform pipeline

Training Loop Overview

The training loop includes forward pass, loss calculation, backward pass, and optimizer step.

for epoch in range(num_epochs):
    for batch in dataloader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

Forward Pass and Predictions

Pass input through the model to get predictions.

outputs = model(images)

Loss Calculation

Compare predictions with ground truth to calculate error.

loss = loss_fn(outputs, labels)

Backward Pass and Optimizer Step

Compute gradients with backward() and update weights with optimizer.

loss.backward()
optimizer.step()

Epochs and Iterations

Epoch = 1 pass over all training data. Iteration = 1 batch update.

for epoch in range(10):  # 10 epochs
    ...

Validation Set Usage

Check performance on unseen data during training to avoid overfitting.

model.eval()
with torch.no_grad():
    val_output = model(val_images)

Overfitting and Underfitting

Monitor train/val loss to detect these issues. Use regularization, dropout, or more data.

# Early stopping, dropout layers, or data augmentation help mitigate

Checkpointing Models

Save models periodically during training.

torch.save(model.state_dict(), 'model_epoch3.pth')

Tracking Metrics

Use accuracy, precision, recall, etc. to track performance.

correct += (predictions == labels).sum().item()
accuracy = correct / total

Using tqdm for Progress Bars

Wrap loops with tqdm for visual training progress.

from tqdm import tqdm

for batch in tqdm(dataloader):
    ...

GPU Training with CUDA

Move model and tensors to GPU for faster computation.

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
inputs, labels = inputs.to(device), labels.to(device)

Logging with TensorBoard

Visualize losses, metrics, graphs, etc. during training.

from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter()
writer.add_scalar("Loss/train", loss, step)

Early Stopping Techniques

Stop training if validation loss stops improving.

if val_loss > best_loss:
    patience_counter += 1
    if patience_counter > patience:
        break

Batch Size Selection

Trade-off between speed (large batches) and generalization (small batches).

# Try values: 32, 64, 128
DataLoader(..., batch_size=64)

Reproducibility in PyTorch

Fix seeds and set deterministic options to reproduce results.

import torch
import random
import numpy as np

torch.manual_seed(0)
np.random.seed(0)
random.seed(0)
torch.backends.cudnn.deterministic = True

What are CNNs?

Convolutional Neural Networks (CNNs) are deep learning models designed to process grid-like data such as images.

# CNNs excel in visual tasks like image classification and object detection.
# They use filters to extract spatial features.

Conv2d Layers

Conv2d layers apply 2D convolutions over input images to extract features like edges and textures.

nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1)
# Applies 16 filters on 3-channel input image

MaxPool2d and AvgPool2d

Pooling layers reduce spatial dimensions and help retain the most relevant information.

nn.MaxPool2d(kernel_size=2, stride=2)
# Reduces size by selecting max value in 2x2 blocks

Batch Normalization

Normalizes activations in CNNs to stabilize and speed up training.

nn.BatchNorm2d(num_features=16)
# Normalizes the 16 output channels from Conv2d

Dropout in CNNs

Dropout randomly disables neurons to prevent overfitting during training.

nn.Dropout(p=0.5)
# 50% chance of zeroing each feature

Building CNNs from Scratch

Use PyTorch’s nn.Module to define a custom CNN.

class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(3, 16, 3, 1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc = nn.Linear(16 * 16 * 16, 10)
    def forward(self, x):
        x = self.pool(F.relu(self.conv(x)))
        x = x.view(-1, 16 * 16 * 16)
        return self.fc(x)

Using Pretrained CNNs

Leverage powerful models like ResNet or VGG with pretrained weights from torchvision.

from torchvision.models import resnet18
model = resnet18(pretrained=True)

Transfer Learning with CNNs

Fine-tune pretrained models on new tasks.

# Freeze pretrained layers
for param in model.parameters():
    param.requires_grad = False

# Replace classifier
model.fc = nn.Linear(512, num_classes)

Feature Maps Visualization

Visualize intermediate outputs of CNNs to understand what the model sees.

# Forward hook example:
def hook_fn(module, input, output):
    print(output.shape)

model.layer1[0].register_forward_hook(hook_fn)

CNN for Image Classification

Use CNNs to assign labels to images.

# Output layer for classification
nn.Linear(512, num_classes)

CNN for Object Detection

Detect objects and their locations in images using bounding boxes.

# Use YOLO, SSD or Faster-RCNN for detection
# Output: (x, y, width, height, class)

CNN for Segmentation

Assign a class to each pixel using CNN-based architectures like U-Net.

# U-Net uses encoder-decoder structure
# Final layer: 1x1 conv for per-pixel classification

Custom CNN Architectures

Design unique networks suited to specific datasets and tasks.

# Mix of conv, pooling, dropout, batch norm
# Tailored depth and width

Efficient CNN Design (e.g. MobileNet)

Lightweight CNNs for mobile and embedded systems.

# Use depthwise separable convolutions
# MobileNet: less computation, fewer parameters

CNN Regularization Techniques

Prevent overfitting using techniques like dropout, weight decay, and data augmentation.

# Weight decay in optimizer:
optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)

What are RNNs?

RNNs are neural networks designed for sequential data like text, time series, and audio.

# Maintains hidden state to process sequences step-by-step
# Good for tasks where order matters

Using nn.RNN

PyTorch’s nn.RNN processes sequences with a basic recurrent layer.

rnn = nn.RNN(input_size=10, hidden_size=20, num_layers=1)
output, hidden = rnn(input_seq)

Understanding Hidden States

Hidden states carry contextual information from previous steps.

# hidden.shape = (num_layers, batch_size, hidden_size)

LSTM Networks

LSTM (Long Short-Term Memory) improves RNNs by remembering long-term dependencies.

lstm = nn.LSTM(input_size=10, hidden_size=20)
output, (hn, cn) = lstm(input_seq)

GRU Networks

GRUs are like LSTMs but with fewer gates, making them faster and simpler.

gru = nn.GRU(input_size=10, hidden_size=20)
output, hn = gru(input_seq)

Sequence Padding and Packing

Handle variable-length sequences with `pad_sequence` and `pack_padded_sequence`.

packed = pack_padded_sequence(padded_seq, lengths, batch_first=True)
output, hidden = rnn(packed)

Embeddings for Sequences

Convert word indices into dense vectors using nn.Embedding.

embedding = nn.Embedding(vocab_size, embedding_dim)
embedded_seq = embedding(word_indices)

Working with NLP Tasks

RNNs are used in translation, sentiment analysis, and more.

# Use embeddings → RNN → Fully Connected → Softmax

Text Generation with RNNs

Train RNNs to predict the next character or word given previous input.

# Feed previous output as next input during generation loop

Attention Mechanisms

Attention allows the model to focus on relevant parts of the input sequence.

# Used in transformers and advanced seq2seq models

Seq2Seq Architecture

Encoder-decoder RNNs transform input sequences into output sequences (e.g., translation).

# Encoder encodes input → decoder generates output

Encoder-Decoder with RNNs

Chain an encoder RNN with a decoder RNN for tasks like translation or summarization.

encoder_outputs, hidden = encoder(input_seq)
output = decoder(target_seq, hidden)

Using torchtext with RNNs

torchtext simplifies text preprocessing, tokenization, and batching for NLP models.

from torchtext.legacy.data import Field, BucketIterator
TEXT = Field(tokenize='spacy', include_lengths=True)

Bidirectional RNNs

Process input in both forward and backward directions to capture more context.

rnn = nn.RNN(input_size, hidden_size, bidirectional=True)

Applications in Time Series and NLP

RNNs are used in forecasting, classification, tagging, and generation.

# Example: Predict stock prices using past prices as sequence input

What is a Transformer?

Transformers are deep learning architectures designed for handling sequential data using attention mechanisms, rather than RNNs.

# Basic PyTorch import
import torch.nn as nn

# Transformer model skeleton
model = nn.Transformer()

Self-Attention Mechanism

Allows the model to weigh the importance of different words in a sequence relative to each other.

# Self-attention concept: each word attends to every other word
# Built into nn.Transformer

Multi-Head Attention

Splits attention into multiple parts to capture diverse representations.

multi_head = nn.MultiheadAttention(embed_dim=512, num_heads=8)

Positional Encoding

Since transformers have no recurrence, positional encodings provide sequence order information.

# Use sinusoids or learnable embeddings
# PyTorch example:
pos_embedding = nn.Parameter(torch.zeros(1, seq_len, embed_dim))

TransformerEncoder and Decoder Modules

High-level modules for building custom transformer architectures.

encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8)
encoder = nn.TransformerEncoder(encoder_layer, num_layers=6)

Vision Transformers (ViTs)

Applies transformer models to image patches rather than text tokens.

# Vision Transformer requires patching and linear embedding
# Use `timm` or HuggingFace models

Using HuggingFace Transformers

Pretrained transformer models like BERT, GPT are available with ease of use.

from transformers import BertModel
model = BertModel.from_pretrained('bert-base-uncased')

BERT and GPT with PyTorch

BERT is for encoding tasks, GPT for generation. Both supported by HuggingFace and PyTorch natively.

from transformers import GPT2Model
gpt_model = GPT2Model.from_pretrained("gpt2")

Fine-Tuning Transformers

Load pretrained models and train them on your specific task dataset.

# Add classification head and backpropagate loss
outputs = model(input_ids)
loss = loss_fn(outputs, labels)
loss.backward()

Tokenization and Preprocessing

Tokenizers convert text into tokens/ids, often with padding and truncation.

from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello world!", return_tensors="pt")

Sequence Classification with Transformers

Common NLP task using a transformer head for classification.

from transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')

Question Answering

Use models trained on SQuAD-like datasets to answer questions from context.

from transformers import pipeline
qa = pipeline("question-answering")
qa(question="What is JSON?", context="JSON is a data format...")

Transformer for Translation

Translate text from one language to another using pretrained models like Marian or T5.

from transformers import MarianMTModel, MarianTokenizer
# Load English to French model
tokenizer = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-fr")
model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-en-fr")

Transformer Efficiency Optimizations

Techniques include distillation, quantization, and efficient attention mechanisms like Longformer or FlashAttention.

# Use distilBERT for lightweight applications
from transformers import DistilBertModel
model = DistilBertModel.from_pretrained('distilbert-base-uncased')

Comparing RNNs and Transformers

Transformers offer parallelism, better long-range dependencies, but require more memory.

# RNN: sequential, slower
# Transformer: parallel, faster with large data

What is TorchVision?

TorchVision is a PyTorch library for vision datasets, models, and image transformations.

import torchvision
# Datasets, models, transforms available in torchvision

Image Classification Datasets

Includes CIFAR-10, ImageNet, MNIST, FashionMNIST, etc.

from torchvision.datasets import CIFAR10
dataset = CIFAR10(root='./data', download=True)

Image Preprocessing Pipelines

Transform raw images into tensors and apply normalization, resizing, cropping, etc.

from torchvision import transforms
transform = transforms.Compose([
  transforms.Resize((224, 224)),
  transforms.ToTensor()
])

Loading CIFAR-10 and ImageNet

Load commonly used datasets with transforms and data loaders.

train_data = CIFAR10(root='./data', train=True, transform=transform)

Transforms: Resize, Normalize, etc.

Transforms are chainable functions for image augmentation or preparation.

transform = transforms.Compose([
  transforms.Resize(256),
  transforms.CenterCrop(224),
  transforms.ToTensor(),
  transforms.Normalize(mean=[0.5], std=[0.5])
])

Data Augmentation Techniques

Use random crop, horizontal flip, rotation, and color jitter to enhance training data.

transform = transforms.Compose([
  transforms.RandomHorizontalFlip(),
  transforms.RandomCrop(32),
  transforms.ToTensor()
])

Using Pretrained Models

TorchVision provides pretrained ResNet, VGG, AlexNet, etc.

from torchvision.models import resnet18
model = resnet18(pretrained=True)

Fine-tuning CNNs with TorchVision

Replace the final layer of pretrained models and train on custom data.

model.fc = nn.Linear(model.fc.in_features, 10)  # for 10-class output

Image Segmentation Tasks

Use models like DeepLabV3 or FCN for pixel-wise segmentation.

from torchvision.models.segmentation import fcn_resnet50
model = fcn_resnet50(pretrained=True)

Object Detection with Faster R-CNN

Detect and localize objects in images with bounding boxes.

from torchvision.models.detection import fasterrcnn_resnet50_fpn
model = fasterrcnn_resnet50_fpn(pretrained=True)

Visualizing Filters and Activations

Access internal layers and visualize outputs to understand model behavior.

# Hook layer outputs for inspection
activation = {}
def hook_fn(module, input, output):
    activation['conv1'] = output

model.conv1.register_forward_hook(hook_fn)

Custom Image Datasets

Load your own image datasets with ImageFolder or custom Dataset class.

from torchvision.datasets import ImageFolder
dataset = ImageFolder(root='my_data', transform=transform)

Exporting and Saving Images

Use torchvision.utils.save_image to write tensors to disk as image files.

from torchvision.utils import save_image
save_image(tensor_image, 'output.png')

ImageNet Classification

Use pretrained models to classify images from the 1000 ImageNet categories.

model = resnet18(pretrained=True)
# Output is a 1000-dimensional tensor

TorchVision Utilities

torchvision.utils provides utilities like make_grid and save_image for batch visualization.

from torchvision.utils import make_grid
grid = make_grid(batch_images)

What is TorchText?

TorchText is a PyTorch library used for NLP tasks including preprocessing, datasets, and vocab building.

# Import torchtext
from torchtext.data.utils import get_tokenizer
# Get tokenizer
tokenizer = get_tokenizer("basic_english")
# Tokenize sentence
print(tokenizer("NLP with TorchText is fun!"))  # ['nlp', 'with', 'torchtext', 'is', 'fun', '!']

Text Preprocessing

Preprocessing includes lowercasing, removing punctuation, and tokenization.

import re
def preprocess(text):
    text = text.lower()
    text = re.sub(r'[^\w\s]', '', text)
    return text
print(preprocess("Hello, World!"))  # hello world

Tokenization Techniques

Tokenization breaks text into words or subwords. TorchText offers multiple tokenizers.

tokenizer = get_tokenizer("basic_english")
tokens = tokenizer("Deep learning with NLP")
print(tokens)  # ['deep', 'learning', 'with', 'nlp']

Vocabularies and Embeddings

Build vocabulary and use pretrained embeddings like GloVe.

from torchtext.vocab import GloVe
glove = GloVe(name='6B', dim=100)
print(glove['language'])  # Tensor vector

Building Datasets with TorchText

TorchText provides utilities for loading and creating datasets.

from torchtext.datasets import AG_NEWS
train_iter = AG_NEWS(split='train')
for label, line in train_iter:
    print(label, line)
    break

Text Classification Tasks

Use TorchText datasets for building sentiment and category classifiers.

# Classify AG_NEWS dataset using embeddings + linear model
# Model includes: Embedding -> GlobalAvgPool -> Linear

Embedding Layers

Map words to dense vectors using torch.nn.Embedding or pretrained embeddings.

import torch.nn as nn
embedding = nn.Embedding(num_embeddings=1000, embedding_dim=300)
input = torch.LongTensor([1, 2, 3])
output = embedding(input)

Sequence Padding and Masking

Pad sequences to equal length for batching using `pad_sequence`.

from torch.nn.utils.rnn import pad_sequence
seqs = [torch.tensor([1,2]), torch.tensor([3,4,5])]
padded = pad_sequence(seqs, batch_first=True)
print(padded)

Using RNNs and LSTMs

Recurrent networks process sequences for tasks like sentiment analysis.

rnn = nn.LSTM(input_size=100, hidden_size=128, batch_first=True)
input = torch.rand(4, 10, 100)
output, _ = rnn(input)

Attention for NLP

Attention mechanisms allow the model to focus on important words.

# Attention weights used for context vector
# Common in translation and summarization

Machine Translation

Translate from one language to another using seq2seq or Transformer models.

# Encoder-Decoder with attention or transformer
# Input: "je t’aime" → Output: "i love you"

Named Entity Recognition

Identify names, dates, places in text using sequence labeling.

# BiLSTM + CRF or Transformer for token classification
# Output: "John lives in Canada" → [B-PER, O, O, B-LOC]

BERT and Token Classification

BERT provides contextual embeddings for classification, NER, QA.

from transformers import BertTokenizer, BertForTokenClassification
tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
model = BertForTokenClassification.from_pretrained("bert-base-cased")

Exporting NLP Models

Save trained models with `torch.save()` and reload with `torch.load()`.

torch.save(model.state_dict(), 'nlp_model.pt')
# Later...
model.load_state_dict(torch.load('nlp_model.pt'))

Evaluation Metrics for NLP

Use accuracy, F1-score, BLEU (for translation), and ROUGE (for summarization).

from sklearn.metrics import f1_score
f1_score(y_true, y_pred, average='macro')

What is Torchaudio?

Torchaudio is a PyTorch library for loading, transforming, and modeling audio data.

import torchaudio
waveform, sample_rate = torchaudio.load('audio.wav')
print(waveform.shape)  # (channels, samples)

Loading Audio Data

Use `torchaudio.load()` to load waveform and sample rate.

waveform, sr = torchaudio.load("speech.wav")
print(sr)  # 16000

Audio Transforms

Torchaudio provides transforms like Resample, Normalize, TimeStretch.

transform = torchaudio.transforms.Resample(orig_freq=48000, new_freq=16000)
resampled = transform(waveform)

Spectrograms and MFCC

Convert waveforms to visual features using Spectrogram or MFCC.

spec = torchaudio.transforms.Spectrogram()(waveform)
mfcc = torchaudio.transforms.MFCC()(waveform)

Data Augmentation for Audio

Add noise, time shift, or pitch changes to augment audio training data.

# Time masking
mask = torchaudio.transforms.TimeMasking(time_mask_param=30)
augmented = mask(spec)

Building Audio Datasets

Use custom datasets or built-ins like YESNO or SPEECHCOMMANDS.

from torchaudio.datasets import YESNO
dataset = YESNO(root="./", download=True)
print(len(dataset))

Audio Classification

Classify sound type using CNN or RNN over MFCC/Spectrogram features.

# Input: MFCC features → CNN → Softmax → Class label (e.g., dog bark)

Speech Recognition Basics

Convert audio waveforms into text using models trained on audio transcripts.

# Example: wav2vec2 or DeepSpeech models for STT (Speech-to-Text)

Pretrained Audio Models

Use models like HuBERT, wav2vec2 from torchaudio.pipelines.

from torchaudio.pipelines import Wav2Vec2Model
model = Wav2Vec2Model.from_pretrained("facebook/wav2vec2-base")

Fine-tuning with Torchaudio

Use transfer learning by fine-tuning models on domain-specific audio datasets.

# Freeze base layers, retrain final classifier on new dataset
for param in model.feature_extractor.parameters():
    param.requires_grad = False

Audio Segmentation Tasks

Split audio into meaningful chunks like words or phonemes.

# Use timestamps or VAD (Voice Activity Detection) to split audio

Custom Audio Transforms

Create your own transformation by subclassing `nn.Module`.

class CustomVolume(nn.Module):
    def forward(self, waveform):
        return waveform * 0.5  # reduce volume by 50%

Using RNNs for Audio

RNNs can model audio as sequences, suitable for classification or recognition.

rnn = nn.LSTM(input_size=40, hidden_size=128, num_layers=2, batch_first=True)
input = torch.randn(8, 100, 40)
output, _ = rnn(input)

Evaluation Metrics

Use accuracy, Word Error Rate (WER), and Signal-to-Noise Ratio (SNR) for evaluation.

from jiwer import wer
truth = "hello world"
hypothesis = "helo wurld"
print(wer(truth, hypothesis))  # e.g., 0.5

Exporting Audio Models

Save model weights with PyTorch and export for inference.

torch.save(model.state_dict(), "audio_model.pt")
# Load
model.load_state_dict(torch.load("audio_model.pt"))

Using .cuda() and .to()

To move tensors or models to GPU, use `.cuda()` or `.to("cuda")`.

import torch
x = torch.randn(2, 2)
x = x.cuda()  # OR x = x.to("cuda")  # Moves tensor to GPU

Checking GPU Availability

Use `torch.cuda.is_available()` to check if a GPU is accessible.

if torch.cuda.is_available():
    print("GPU available")
else:
    print("Using CPU")

Transferring Tensors to GPU

Move both input and model to GPU for computation.

model = model.to("cuda")
inputs = inputs.to("cuda")
outputs = model(inputs)

Optimizing Data Loading

Use `num_workers` and `pin_memory=True` in DataLoader to speed up GPU training.

train_loader = torch.utils.data.DataLoader(
    dataset, batch_size=64, shuffle=True,
    num_workers=4, pin_memory=True
)

Multi-GPU Training

Wrap model with `DataParallel` for simple multi-GPU usage.

model = torch.nn.DataParallel(model)
model = model.cuda()

torch.nn.DataParallel

This parallelizes model across available GPUs but is slower than DDP.

# Automatically splits and gathers inputs across GPUs
outputs = model(inputs)

Distributed Data Parallel (DDP)

DDP is faster and more efficient for multi-GPU training across multiple processes.

# Requires torch.distributed and multiple scripts/processes
# Use torchrun or mp.spawn to launch DDP training

Mixed Precision Training with AMP

Use `torch.cuda.amp` for faster training and reduced memory usage.

from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()
for inputs, targets in dataloader:
    optimizer.zero_grad()
    with autocast():
        outputs = model(inputs)
        loss = criterion(outputs, targets)
    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()

Apex Library for Mixed Precision

NVIDIA Apex provides older mixed precision training tools (less used now).

# apex.amp is deprecated in favor of torch.cuda.amp
# Install: pip install -v --no-cache-dir ./apex

CUDA Memory Management

Use `torch.cuda.empty_cache()` and monitor usage with `torch.cuda.memory_allocated()`.

torch.cuda.empty_cache()  # Frees unused memory
print(torch.cuda.memory_allocated() / 1e6, "MB")

Monitoring GPU Usage

Use `nvidia-smi` in the terminal or libraries like `gpustat`.

# Terminal:
nvidia-smi

# Python:
!pip install gpustat
!gpustat

Debugging CUDA Errors

CUDA errors are often delayed — use `CUDA_LAUNCH_BLOCKING=1` to debug.

# Run your script with this env var
CUDA_LAUNCH_BLOCKING=1 python train.py

Training Large Models

Use gradient checkpointing, mixed precision, and model parallelism for large models.

# Gradient checkpointing: reduce memory usage
from torch.utils.checkpoint import checkpoint
output = checkpoint(model_block, input)

Profiling GPU Performance

Use `torch.profiler` to profile GPU usage, memory, and latency.

import torch.profiler as profiler
with profiler.profile(...):  # Add your config
    outputs = model(inputs)

Saving GPU Models

Use `torch.save(model.state_dict())` — safe for CPU and GPU.

torch.save(model.state_dict(), "model.pth")
# To load:
model.load_state_dict(torch.load("model.pth"))
model.to("cuda")

Saving Models with torch.save()

Use `torch.save()` to store model weights or full models.

torch.save(model.state_dict(), 'model.pth')

Loading Models with torch.load()

Load saved weights and reinitialize the model before inference.

model.load_state_dict(torch.load('model.pth'))
model.eval()

Script vs Trace Mode

Script mode captures full model logic; trace mode records a single forward pass.

scripted = torch.jit.script(model)
traced = torch.jit.trace(model, sample_input)

TorchScript Overview

TorchScript allows saving models in a serialized form for deployment.

torch.jit.save(scripted, 'scripted_model.pt')

Exporting to ONNX Format

Export models to ONNX format for cross-platform inference.

torch.onnx.export(model, input_tensor, "model.onnx")

ONNX Runtime Integration

Use `onnxruntime` for fast inference in production environments.

import onnxruntime as ort
sess = ort.InferenceSession("model.onnx")
outputs = sess.run(None, {"input": input_array})

PyTorch on Mobile

Convert models with TorchScript and deploy on iOS or Android apps.

# Save for mobile
scripted_model = torch.jit.script(model)
scripted_model.save("mobile_model.pt")

Flask/FastAPI Model Serving

Deploy models using Flask or FastAPI with JSON API endpoints.

@app.post("/predict/")
def predict(data: Input):
    tensor = torch.tensor(data.features)
    result = model(tensor)
    return result.tolist()

TorchServe for Production

Use TorchServe to serve models at scale with REST endpoints and monitoring.

# Define handler + model archive (.mar)
torch-model-archiver --model-name=my_model ...
torchserve --start --model-store model_store ...

Deploying to AWS/GCP

Use EC2/GCP VM + Docker, or AWS SageMaker for scalable deployment.

# On AWS SageMaker:
# 1. Package model + inference.py
# 2. Create PyTorchModel()
# 3. Deploy with model.deploy()

Exporting to CoreML (iOS)

Convert ONNX to CoreML for native iOS inference using coremltools.

import coremltools as ct
coreml_model = ct.converters.onnx.convert("model.onnx")
coreml_model.save("Model.mlmodel")

Exporting to TensorRT

TensorRT speeds up inference by optimizing ONNX models for NVIDIA GPUs.

# Use ONNX + TensorRT or torch2trt
# Example: torch2trt(model, [input])

Web Deployment with WebAssembly

Use ONNX.js or TensorFlow.js to deploy converted models in-browser.

# Convert to ONNX
# Load with ONNX.js in JavaScript frontend

Monitoring Inference Performance

Use tools like Prometheus, Sentry, or TorchServe logs to track latency, throughput.

# TorchServe provides built-in logs and Prometheus metrics
# You can also log API latency per request manually

Model Versioning

Use Git, DVC, or TorchServe snapshots to manage different model versions.

# DVC:
dvc add model.pth
git commit model.pth.dvc

# TorchServe:
model_name.mar → versioned in model-store/

What is PyTorch Lightning?

PyTorch Lightning is a lightweight PyTorch wrapper that simplifies training, scaling, and readability without sacrificing flexibility.

# Lightning separates engineering from research
# Improves reproducibility and model structure

Benefits and Abstractions

Lightning abstracts training loops, logging, checkpointing, and device placement.

# You define only:
# - forward
# - training_step
# - validation_step
# - configure_optimizers

Installing Lightning

# Install from pip
pip install lightning

LightningModule Basics

Encapsulates all the logic for training/validation/testing in one module.

import lightning as L
class MyModel(L.LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(28*28, 10)

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        return F.cross_entropy(y_hat, y)

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters())

Training with Trainer Class

The `Trainer` class runs your entire training loop, supporting GPUs, TPUs, loggers, and callbacks.

trainer = L.Trainer(max_epochs=5, accelerator="gpu", devices=1)
trainer.fit(model, train_loader, val_loader)

Logging and Callbacks

# Automatically log metrics using log()
self.log("train_loss", loss)

# Use callbacks for custom hooks like Learning Rate Monitor or custom schedulers

Model Checkpointing

Save the best model automatically during training.

from lightning.pytorch.callbacks import ModelCheckpoint
checkpoint_cb = ModelCheckpoint(monitor="val_loss")
trainer = L.Trainer(callbacks=[checkpoint_cb])

TPU and Multi-GPU Support

Trainer scales across hardware with no code change.

# Run on 8 GPUs
trainer = L.Trainer(devices=8, accelerator="gpu", strategy="ddp")

Using LightningDataModule

Encapsulate data loading, splitting, and transformation logic.

class MNISTDataModule(L.LightningDataModule):
    def prepare_data(self): ...
    def setup(self, stage=None): ...
    def train_dataloader(self): ...
    def val_dataloader(self): ...

Early Stopping in Lightning

Stop training when validation loss stops improving.

from lightning.pytorch.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor="val_loss", patience=3)
trainer = L.Trainer(callbacks=[early_stop])

Integrating with Loggers (W&B, TensorBoard)

from lightning.pytorch.loggers import TensorBoardLogger
logger = TensorBoardLogger("tb_logs", name="my_model")
trainer = L.Trainer(logger=logger)

Hyperparameter Tuning

Integrate with Optuna or RayTune for hyperparameter sweeps.

# Use LightningCLI or Hydra for structured tuning
trainer = L.Trainer(auto_lr_find=True)

Distributed Training

Scale to multiple nodes or GPUs using DDP or FSDP.

trainer = L.Trainer(accelerator="gpu", strategy="ddp", devices=4)

Productionization with Lightning

Use `torch.jit`, export scripts, ONNX, and checkpoints for deployment.

scripted_model = torch.jit.script(model)
scripted_model.save("model.pt")

Lightning vs Vanilla PyTorch

Lightning removes boilerplate but uses the same PyTorch code under the hood.

# Lightning == PyTorch + Engineering Abstraction
# You can still use native PyTorch anywhere

Introduction to RL

Reinforcement Learning is a learning paradigm where agents learn by interacting with environments to maximize rewards.

Markov Decision Process

MDP defines state, action, transition probability, and reward structure for an RL problem.

# (S, A, P, R, γ)
# S = states, A = actions
# P = state transitions
# R = reward function
# γ = discount factor

Q-Learning Concepts

Q-learning uses value iteration to estimate optimal policy without a model of the environment.

Q[state][action] = reward + gamma * max(Q[next_state])

Deep Q-Networks (DQN)

Use neural networks to approximate Q-values for high-dimensional state spaces.

q_network = nn.Sequential(
    nn.Linear(state_dim, 128),
    nn.ReLU(),
    nn.Linear(128, action_dim)
)

Policy Gradient Methods

Directly optimize the policy using gradients of expected reward.

# Loss = -log(prob) * reward
loss = -torch.log(prob[action]) * reward

Advantage Actor-Critic (A2C)

A2C uses a separate value network to stabilize training.

Advantage = reward - value(state)

Proximal Policy Optimization (PPO)

PPO clips policy update ratio to prevent large updates.

ratio = new_prob / old_prob
loss = min(ratio * adv, clip(ratio, 1-eps, 1+eps) * adv)

Environment Setup with OpenAI Gym

import gym
env = gym.make("CartPole-v1")
state = env.reset()

Creating Custom Environments

Define new Gym environments by subclassing `gym.Env`.

class MyEnv(gym.Env):
    def __init__(self): ...
    def step(self, action): ...
    def reset(self): ...
    def render(self): ...

Reward Shaping

Modify the reward function to encourage better learning behavior.

# Example: penalty for time or energy use
reward = base_reward - penalty

Exploration Strategies

Use ε-greedy, Boltzmann exploration, or noise to explore action space.

if random() < epsilon:
    action = random_action()
else:
    action = best_action()

Saving and Evaluating Agents

# Save model
torch.save(model.state_dict(), "agent.pth")

# Load model
model.load_state_dict(torch.load("agent.pth"))

Training Stability Techniques

Use target networks, experience replay, normalization, or learning rate schedules.

# DQN Target Network:
target_net.load_state_dict(q_net.state_dict())

Visualization of Learning

Plot total reward per episode or use gym's render method.

import matplotlib.pyplot as plt
plt.plot(rewards)
plt.xlabel("Episode")
plt.ylabel("Reward")

Multi-Agent RL

Train multiple agents in shared or competing environments with self-play, MADDPG, or PPO.

# Use PettingZoo or multi-agent Gym extensions
# Each agent may have separate networks or shared policies

GANs in PyTorch

Generative Adversarial Networks consist of a generator and a discriminator competing to improve generation quality.

import torch
import torch.nn as nn

class Generator(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(100, 256),
            nn.ReLU(),
            nn.Linear(256, 784),
            nn.Tanh()
        )
    def forward(self, x):
        return self.net(x)

# Generator creates fake images from noise
noise = torch.randn(16, 100)  # batch of 16 noise vectors
gen = Generator()
fake_images = gen(noise)  # generated images shaped (16, 784)

Attention Mechanisms Deep Dive

Attention lets models focus on important parts of input; key in Transformers.

import torch.nn.functional as F

def scaled_dot_product_attention(query, key, value):
    scores = torch.matmul(query, key.transpose(-2, -1)) / (query.size(-1)**0.5)
    weights = F.softmax(scores, dim=-1)
    output = torch.matmul(weights, value)
    return output, weights

Capsule Networks

Capsules encode spatial hierarchies; better at preserving pose information.

# Pseudo-code for capsule layer routing
def routing(u_hat, num_iterations=3):
    b = torch.zeros(...)
    for i in range(num_iterations):
        c = softmax(b)
        s = (c * u_hat).sum(dim=1)
        v = squash(s)
        b += (u_hat * v).sum(dim=-1)
    return v

Few-shot Learning

Train models to generalize from very few examples using meta-learning techniques.

# Example: Use pretrained embeddings with nearest neighbor classifier
def classify(query, support_set, support_labels):
    distances = torch.cdist(query, support_set)
    nearest = distances.argmin(dim=1)
    return support_labels[nearest]

Contrastive Learning

Learn representations by pulling similar examples closer and pushing dissimilar apart.

# Simple contrastive loss example
def contrastive_loss(z_i, z_j, temperature=0.5):
    sim = torch.matmul(z_i, z_j.T) / temperature
    labels = torch.arange(len(z_i))
    return nn.CrossEntropyLoss()(sim, labels)

Metric Learning

Learn a distance metric to cluster similar data points together in embedding space.

# Triplet loss example
import torch.nn.functional as F

def triplet_loss(anchor, positive, negative, margin=1.0):
    pos_dist = F.pairwise_distance(anchor, positive)
    neg_dist = F.pairwise_distance(anchor, negative)
    loss = F.relu(pos_dist - neg_dist + margin)
    return loss.mean()

Zero-shot and Self-supervised Learning

Zero-shot: Generalize to unseen classes without training; Self-supervised: Learn from unlabeled data.

# Zero-shot: Use pretrained CLIP for text-image similarity
# Self-supervised: Predict missing parts or augmentations

EfficientNet

A convolutional model family scaling width, depth, and resolution efficiently.

from torchvision.models import efficientnet_b0
model = efficientnet_b0(pretrained=True)
# Fine-tune on custom dataset

Neural Architecture Search (NAS)

Automate design of neural network architectures using search algorithms.

# Example: Use NAS libraries to explore model variants
# Pseudo-code: Define search space, evaluate candidates, select best

Quantization

Reduce model size and increase inference speed by using lower precision (e.g., 8-bit).

import torch.quantization

model = ...  # define model
model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
torch.quantization.prepare(model, inplace=True)
# Calibrate on data
torch.quantization.convert(model, inplace=True)

Pruning and Sparsity

Remove less important weights to reduce size and improve speed without much accuracy loss.

import torch.nn.utils.prune as prune

prune.l1_unstructured(model.layer1, name='weight', amount=0.4)
# 40% weights pruned in layer1

Knowledge Distillation

Train smaller "student" models to mimic large "teacher" models' outputs.

# Pseudo-code for distillation loss
loss = alpha * student_loss + (1 - alpha) * distillation_loss(student_logits, teacher_logits)

Neural Style Transfer

Combine content of one image with style of another using deep features.

# Use pretrained VGG to compute content & style loss
# Optimize input image to minimize combined loss

Image Captioning

Generate text descriptions for images using CNN + RNN or Transformer models.

# CNN encodes image features
# RNN generates word sequences conditioned on features

Multimodal Learning

Combine data from multiple modalities, like images and text, into joint models.

# Example: Fuse image and text embeddings for classification

Image Classifier from Scratch

Build a CNN to classify images into categories using PyTorch.

import torch
import torch.nn as nn
import torch.optim as optim

class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(3, 16, 3)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc = nn.Linear(16 * 30 * 30, 10)  # for 64x64 input

    def forward(self, x):
        x = self.pool(torch.relu(self.conv(x)))
        x = x.view(-1, 16 * 30 * 30)
        x = self.fc(x)
        return x

Text Sentiment Analyzer

Use RNN or Transformer to classify sentiment from text data.

# Tokenize text, embed, feed into LSTM or Transformer encoder
# Final layer: sigmoid for binary sentiment

Speech Command Recognizer

Train a model to recognize spoken commands using audio features like MFCCs.

# Extract MFCC features from audio
# Feed to CNN or RNN classifier

Pneumonia Detection from X-ray

Build a model to classify chest X-ray images as pneumonia or normal.

# Use pretrained CNN (e.g. ResNet)
# Fine-tune last layers on X-ray dataset

Fake News Detector

Detect fake news articles using NLP techniques and text classification.

# Preprocess text, vectorize
# Train classifier (e.g., Transformer or LSTM)

Language Translator

Build a sequence-to-sequence model to translate between languages.

# Use encoder-decoder architecture with attention

Style Transfer App

Implement neural style transfer in an interactive app.

# Use pretrained VGG for style/content extraction
# Optimize input image iteratively

Object Detection System

Detect and localize objects in images using models like YOLO or Faster R-CNN.

# Use pretrained detection model
# Fine-tune on custom dataset if needed

Chatbot with Transformer

Build conversational AI using transformer models for dialogue generation.

# Use pretrained GPT or custom seq2seq transformer
# Fine-tune on conversational datasets

Time Series Forecasting

Predict future values in time series data with RNNs or Temporal Convolutional Networks.

# Prepare sequences of historical data as input
# Train model to predict next step(s)

Music Genre Classification

Classify music audio clips into genres using CNNs on spectrograms.

# Convert audio to spectrogram images
# Train CNN classifier

Real-time Emotion Recognition

Recognize emotions from facial expressions or speech in real time.

# Use CNN or RNN on video frames or audio
# Output emotion class probabilities

OCR with PyTorch

Extract text from images using CNN+RNN or Transformer OCR models.

# Detect text regions with CNN
# Recognize characters using sequence models

AI Voice Assistant

Build a voice assistant using speech recognition, NLP, and TTS components.

# Integrate speech-to-text, intent recognition, and text-to-speech

Model Deployment Dashboard

Deploy models with web dashboard to monitor predictions, usage, and performance.

# Use Flask/FastAPI backend serving PyTorch models
# Frontend dashboard for metrics and inputs

The site is under development.

PyTorch Tutorial

Chapter 1: Introduction to PyTorch