PyTorch is an open-source machine learning framework developed by Facebook's AI Research team. It's known for its ease of use and dynamic computation graph.
# Import PyTorch import torch # Check PyTorch version print(torch.__version__)
PyTorch is preferred for research due to dynamic computation graphs; TensorFlow is often used in production with static graphs.
# Example dynamic graph in PyTorch x = torch.tensor(2.0, requires_grad=True) y = x**2 + 3*x + 4 y.backward() # auto-calculates gradient print(x.grad) # prints gradient of y w.r.t x
Install via pip or conda for CPU or CUDA GPU support.
# For CPU pip install torch torchvision torchaudio # For GPU (CUDA 11.8) pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
PyTorch includes TorchVision (images), TorchText (text), TorchAudio (sound), and PyTorch Lightning (training).
# Example import of TorchVision import torchvision # Load pretrained model model = torchvision.models.resnet18(pretrained=True)
Torch-TensorRT optimizes PyTorch models to run on NVIDIA hardware, aiding TensorFlow-like deployment.
# Requires NVIDIA TensorRT setup # Converts PyTorch model to optimized format import torch_tensorrt # Example conversion (commented) # trt_model = torch_tensorrt.compile(model, inputs=[...])
Tensors are multi-dimensional arrays (like NumPy arrays) and the core data structure in PyTorch.
# Create a tensor tensor = torch.tensor([[1, 2], [3, 4]]) print(tensor)
PyTorch builds the graph as operations are run, making debugging and flexibility easier.
x = torch.tensor(1.0, requires_grad=True) y = x * 3 z = y ** 2 z.backward() print(x.grad) # prints dz/dx
Used for deep learning, reinforcement learning, computer vision, NLP, and production models.
# Example: Sentiment classification using torchtext (not shown here for brevity) # PyTorch is also used for StyleGAN, DALL·E, and other AI research
Versions are updated frequently. Major versions may change APIs. Use `torch.__version__` to check.
print("PyTorch version:", torch.__version__)
Involves: preparing data, building models, defining loss/optimizer, training, and evaluating.
# Step 1: Define tensor inputs x = torch.tensor([[1.0], [2.0]]) y = torch.tensor([[2.0], [4.0]]) # Step 2: Define a simple linear model model = torch.nn.Linear(1, 1) # Step 3: Define loss and optimizer loss_fn = torch.nn.MSELoss() optimizer = torch.optim.SGD(model.parameters(), lr=0.01) # Step 4: Train loop for epoch in range(10): pred = model(x) loss = loss_fn(pred, y) optimizer.zero_grad() loss.backward() optimizer.step()
PyTorch supports GPU acceleration using CUDA.
# Move tensor to GPU (if available) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") x = torch.tensor([1.0]).to(device) print(x.device)
Supported by Jupyter Notebook, PyCharm, VSCode, Google Colab, and Kaggle Kernels.
# Google Colab has GPU support pre-installed # Use `!nvidia-smi` in Colab to verify GPU
Active community on GitHub, forums, PyTorch Lightning, HuggingFace, and conferences like NeurIPS.
These libraries provide datasets, models, and tools for images, text, and audio.
from torchvision import datasets from torchtext.datasets import IMDB from torchaudio.datasets import SPEECHCOMMANDS
A simple linear model to predict y = 2x.
import torch # Data x = torch.tensor([[1.0], [2.0]]) y = torch.tensor([[2.0], [4.0]]) # Model model = torch.nn.Linear(1, 1) # Loss & Optimizer loss_fn = torch.nn.MSELoss() optimizer = torch.optim.SGD(model.parameters(), lr=0.01) # Training loop for epoch in range(20): y_pred = model(x) loss = loss_fn(y_pred, y) optimizer.zero_grad() loss.backward() optimizer.step() print(f"Epoch {epoch+1}, Loss: {loss.item()}")
You can create tensors from lists, NumPy arrays, or use built-in methods like zeros, ones, rand.
x = torch.tensor([[1, 2], [3, 4]]) zeros = torch.zeros(2, 2) ones = torch.ones(2, 2) rand = torch.rand(2, 2)
Rank is the number of dimensions. Shape shows the size in each dimension.
t = torch.rand(3, 4, 5) print(t.shape) # (3, 4, 5) print(t.ndim) # 3
Access elements or sub-tensors using standard Python slicing.
t = torch.tensor([[1, 2], [3, 4]]) print(t[0]) # First row print(t[:, 1]) # Second column
Tensors support arithmetic operations and dot products.
a = torch.tensor([1, 2]) b = torch.tensor([3, 4]) print(torch.add(a, b)) print(torch.mul(a, b)) print(torch.dot(a, b))
Allows arithmetic on tensors with different shapes by expanding dimensions automatically.
a = torch.tensor([[1], [2]]) b = torch.tensor([3, 4]) print(a + b) # Broadcasts b to match shape of a
Change the shape of tensors without changing data.
x = torch.arange(6) print(x.reshape(2, 3)) print(x.view(2, 3))
Swap axes or rotate dimensions.
x = torch.rand(2, 3) print(x.T) # Transpose for 2D y = torch.rand(2, 3, 4) print(y.permute(2, 0, 1)) # Custom dim reordering
Convert between float, int, long, etc.
x = torch.tensor([1.0, 2.0]) print(x.int()) # convert to int
Tensors can be converted to/from NumPy arrays.
import numpy as np a = np.array([1, 2]) t = torch.from_numpy(a) n = t.numpy()
Clone creates a copy. Detach removes from computation graph.
a = torch.tensor([1.0], requires_grad=True) b = a * 2 c = b.detach().clone()
Use torch.nn.init for custom weight init.
import torch.nn.init as init w = torch.empty(3, 5) init.xavier_uniform_(w)
In-place ops modify tensor (`x.add_()`); out-of-place creates new one (`x + y`).
x = torch.tensor([1.0]) x.add_(1.0) # In-place x = x + 1.0 # Out-of-place
Move tensors to GPU using `.cuda()` or `.to("cuda")`.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") x = torch.tensor([1.0]).to(device)
Save using `torch.save()`, load using `torch.load()`.
torch.save(x, "tensor.pt") x_loaded = torch.load("tensor.pt")
Use `.detach()` to break graph, use `.to()` instead of `.cuda()` for flexibility, prefer `reshape()` over `view()` for dynamic tensors.
# Good habit: use `.to(device)` # Use `detach()` before saving intermediate values
Autograd is PyTorch’s automatic differentiation engine that dynamically builds a computational graph and computes gradients during backpropagation.
import torch x = torch.tensor(2.0, requires_grad=True) y = x ** 2 y.backward() print(x.grad) # ➜ Prints: tensor(4.)
Previously, `Variable` was used to wrap tensors, but now Tensors themselves support autograd through `requires_grad`.
# Old way (deprecated): # from torch.autograd import Variable # x = Variable(torch.tensor(1.0), requires_grad=True) # Modern way: x = torch.tensor(1.0, requires_grad=True)
This flag tells PyTorch to track operations on a tensor so gradients can be calculated.
x = torch.tensor(3.0, requires_grad=True) print(x.requires_grad) # ➜ True
Calling `.backward()` on a scalar result triggers gradient computation.
x = torch.tensor(5.0, requires_grad=True) y = x * 2 y.backward() print(x.grad) # ➜ tensor(2.)
Gradients are stored in the `.grad` attribute of leaf tensors.
w = torch.tensor(4.0, requires_grad=True) loss = w ** 3 loss.backward() print(w.grad) # ➜ tensor(48.)
By default, gradients accumulate. Call `.zero_()` to reset before new backward passes.
w.grad.zero_() # Important in training loops to prevent gradient buildup
Use `.detach()` to stop tracking a tensor in the computational graph.
y = x.detach() # y won’t require gradients even if x does
Higher-dimensional derivatives like Jacobians and Hessians can be computed using torch.autograd.functional.
from torch.autograd.functional import jacobian def func(x): return x ** 2 + 3 * x jac = jacobian(func, torch.tensor(2.0)) print(jac) # ➜ tensor(7.)
Set `create_graph=True` to compute higher-order gradients.
x = torch.tensor(1.0, requires_grad=True) y = x ** 3 dy_dx = torch.autograd.grad(y, x, create_graph=True)[0] dy2_dx2 = torch.autograd.grad(dy_dx, x)[0] print(dy2_dx2) # ➜ tensor(6.)
Define custom forward/backward logic with `torch.autograd.Function`.
class Square(torch.autograd.Function): @staticmethod def forward(ctx, input): ctx.save_for_backward(input) return input ** 2 @staticmethod def backward(ctx, grad_output): input, = ctx.saved_tensors return 2 * input * grad_output x = torch.tensor(3.0, requires_grad=True) y = Square.apply(x) y.backward() print(x.grad) # ➜ tensor(6.)
Used to prevent exploding gradients by capping them at a maximum value.
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
Autograd tracks all operations for training and allows optimizer steps using gradients.
loss.backward() # compute gradients optimizer.step() # update parameters optimizer.zero_grad() # reset for next step
Use `.grad`, `.retain_graph=True`, and hooks to inspect problems during gradient flow.
for name, param in model.named_parameters(): print(name, param.grad) # Check if any grad is None
Use tools like `torchviz` to visualize computational graphs for complex models.
from torchviz import make_dot x = torch.tensor(2.0, requires_grad=True) y = x ** 2 make_dot(y).render("graph", format="png")
Always manage `.grad`, use `.detach()` for inference, and validate gradients to ensure correct flow.
# Tip: # Use .no_grad() during evaluation to disable autograd: with torch.no_grad(): output = model(input)
`torch.nn` is a module for building neural network layers, models, and loss functions.
import torch.nn as nn model = nn.Linear(10, 1) # A single-layer network
All custom models should subclass `nn.Module` and define layers in `__init__`, forward logic in `forward()`.
class MyModel(nn.Module): def __init__(self): super().__init__() self.fc = nn.Linear(10, 1) def forward(self, x): return self.fc(x)
For simple models, `nn.Sequential` stacks layers without needing a full class.
model = nn.Sequential( nn.Linear(10, 20), nn.ReLU(), nn.Linear(20, 1) )
Create flexible models by extending `nn.Module` and overriding `forward` behavior.
class CustomNet(nn.Module): def __init__(self): super().__init__() self.hidden = nn.Linear(4, 4) self.out = nn.Linear(4, 1) def forward(self, x): x = torch.relu(self.hidden(x)) return self.out(x)
PyTorch offers many built-in layers including linear, convolutional, recurrent, and more.
nn.Linear(in_features, out_features) nn.Conv2d(in_channels, out_channels, kernel_size) nn.LSTM(input_size, hidden_size)
Activations are layers too. Common ones include ReLU, Sigmoid, and Tanh.
x = torch.relu(x) x = torch.sigmoid(x) x = torch.tanh(x)
Fully connected (dense) networks are built using stacked `nn.Linear` and activations.
model = nn.Sequential( nn.Linear(10, 64), nn.ReLU(), nn.Linear(64, 1) )
`forward()` defines how data flows through your layers and is automatically used when calling `model(input)`.
def forward(self, x): x = self.layer1(x) x = self.activation(x) return self.layer2(x)
Manually initialize weights for control over training behavior.
def init_weights(m): if isinstance(m, nn.Linear): torch.nn.init.xavier_uniform_(m.weight) model.apply(init_weights)
Use `state_dict()` to save, and `load_state_dict()` to restore models.
torch.save(model.state_dict(), "model.pth") model.load_state_dict(torch.load("model.pth")) model.eval()
Use `torchinfo.summary()` or print model for insights.
from torchinfo import summary summary(model, input_size=(1, 10))
Access weights and biases directly for inspection or adjustment.
for name, param in model.named_parameters(): print(name, param.shape)
Submodules can be grouped inside a parent model for structure.
self.subnet = nn.Sequential( nn.Linear(5, 10), nn.ReLU() )
Freeze layers during training for transfer learning or fine-tuning.
for param in model.layer.parameters(): param.requires_grad = False
Use pre-trained models and fine-tune the final layers for new tasks.
from torchvision import models model = models.resnet18(pretrained=True) # Freeze all layers for param in model.parameters(): param.requires_grad = False # Replace final layer model.fc = nn.Linear(512, 10)
Optimization adjusts model parameters to minimize loss using gradients computed via backpropagation.
# Simplified training loop with optimization loss.backward() # Compute gradients optimizer.step() # Update parameters optimizer.zero_grad() # Clear old gradients
PyTorch’s `torch.optim` module provides common optimization algorithms for training neural networks.
import torch.optim as optim optimizer = optim.SGD(model.parameters(), lr=0.01)
Stochastic Gradient Descent is a basic and widely-used optimizer.
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
Adam combines momentum and adaptive learning rates, often yielding faster convergence.
optimizer = optim.Adam(model.parameters(), lr=0.001)
Momentum helps accelerate SGD in the right direction, Nesterov is a variant with lookahead.
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9, nesterov=True)
Schedulers adjust the learning rate over time to improve training stability and convergence.
from torch.optim.lr_scheduler import StepLR scheduler = StepLR(optimizer, step_size=10, gamma=0.1)
Gradient descent updates weights in the negative direction of the loss gradient.
# weight = weight - learning_rate * gradient
You can define your own optimizer by subclassing `torch.optim.Optimizer`.
class CustomOptimizer(torch.optim.Optimizer): def __init__(self, params, lr=0.01): super().__init__(params, defaults={'lr': lr}) def step(self): # Custom update logic pass
Weight decay (L2 regularization) penalizes large weights to prevent overfitting.
optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)
Gradient clipping prevents exploding gradients in unstable models (e.g., RNNs).
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
You can access and modify optimizer parameters dynamically during training.
for param_group in optimizer.param_groups: param_group['lr'] = 0.001 # Update learning rate
Try different optimizers on the same task to find the most effective one.
optimizers = [optim.SGD, optim.Adam, optim.RMSprop] # Measure loss over epochs for each
To resume training, save and load the optimizer’s state_dict.
torch.save(optimizer.state_dict(), "opt.pth") optimizer.load_state_dict(torch.load("opt.pth"))
Combine optimizers and schedulers to dynamically adjust learning rates.
scheduler.step() # Usually after each epoch
Print gradients, learning rates, and losses to monitor training progress.
for name, param in model.named_parameters(): print(name, param.grad.norm()) # Print gradient norm
A loss function measures how far a model's output is from the target. Lower loss = better performance.
loss = loss_fn(predicted_output, true_output)
PyTorch provides many built-in loss classes in `torch.nn`.
import torch.nn as nn loss_fn = nn.MSELoss()
Common for regression: MSE penalizes squared error, L1 penalizes absolute error.
mse = nn.MSELoss() l1 = nn.L1Loss()
Used for classification; combines `LogSoftmax` and `NLLLoss`.
loss_fn = nn.CrossEntropyLoss() output = model(input) # raw logits loss = loss_fn(output, target)
For binary classification with raw logits (sigmoid + binary cross-entropy).
loss_fn = nn.BCEWithLogitsLoss() loss = loss_fn(model(x), target)
Kullback–Leibler divergence measures difference between probability distributions.
kl = nn.KLDivLoss(reduction="batchmean")
Useful for learning to rank or distance-based training (e.g., embeddings).
triplet = nn.TripletMarginLoss(margin=1.0) # anchor, positive, negative = ... loss = triplet(anchor, positive, negative)
Encourages similar samples to be closer and dissimilar farther apart.
# Often custom or from libraries like `torchcontrib` or `pytorch-metric-learning`
Balances sensitivity and robustness; combines MSE and MAE behavior.
loss_fn = nn.SmoothL1Loss() # Huber Loss
You can create custom losses by subclassing `nn.Module` or writing simple functions.
class CustomLoss(nn.Module): def forward(self, input, target): return torch.mean((input - target)**2)
Give more importance to certain classes or samples using `weight` argument.
loss_fn = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 2.0]))
You can combine losses with weights for multi-task learning.
loss = 0.7 * loss1 + 0.3 * loss2
Use MSE/L1 for regression, CrossEntropy/BCE for classification tasks.
# Regression loss = nn.MSELoss() # Classification loss = nn.CrossEntropyLoss()
Plot loss values during training to monitor convergence or detect overfitting.
import matplotlib.pyplot as plt plt.plot(train_losses) plt.plot(val_losses) plt.legend(['Train', 'Validation']) plt.show()
Select a loss based on task type (regression, classification, ranking, etc.) and data distribution.
# Binary Classification → BCEWithLogitsLoss # Multiclass → CrossEntropyLoss # Regression → MSELoss / SmoothL1Loss
This is the base class for all datasets in PyTorch. You subclass it and override `__len__()` and `__getitem__()`.
from torch.utils.data import Dataset class MyDataset(Dataset): def __init__(self, data): self.data = data def __len__(self): return len(self.data) def __getitem__(self, idx): return self.data[idx]
Custom datasets allow you to load your own data formats, apply logic, and return samples and labels.
class CustomImageDataset(Dataset): def __init__(self, image_paths, labels): self.image_paths = image_paths self.labels = labels def __len__(self): return len(self.image_paths) def __getitem__(self, idx): image = read_image(self.image_paths[idx]) # custom function return image, self.labels[idx]
Prebuilt datasets like MNIST, CIFAR-10 are easily accessible using torchvision.
from torchvision.datasets import MNIST dataset = MNIST(root='data', train=True, download=True)
The DataLoader wraps a Dataset and provides mini-batches, shuffling, and multiprocessing.
from torch.utils.data import DataLoader loader = DataLoader(dataset, batch_size=32, shuffle=True)
Use `batch_size` to control memory, and `shuffle=True` to ensure training randomness.
DataLoader(dataset, batch_size=64, shuffle=True)
Custom collate functions allow complex batching logic, such as padding variable-length sequences.
def my_collate(batch): return torch.stack([item[0] for item in batch]), [item[1] for item in batch] DataLoader(dataset, batch_size=32, collate_fn=my_collate)
Use torchvision.transforms to preprocess images — resize, convert to tensor, etc.
from torchvision import transforms transform = transforms.Compose([ transforms.Resize((128, 128)), transforms.ToTensor() ])
Normalizing inputs helps stabilize training. Typically done to mean=0, std=1 per channel.
transforms.Normalize(mean=[0.5], std=[0.5]) # for grayscale
Augmentation improves generalization by randomly transforming training samples.
transforms.Compose([ transforms.RandomHorizontalFlip(), transforms.RandomRotation(15), transforms.ToTensor() ])
torchtext helps with NLP datasets and tokenization.
from torchtext.datasets import IMDB train_iter = IMDB(split='train') # yields (label, text) pairs
torchaudio makes it easy to load and preprocess audio data.
import torchaudio waveform, sample_rate = torchaudio.load("file.wav")
Use lazy loading, memory mapping, or streaming for massive datasets to reduce memory load.
# Only load one sample at a time in __getitem__ to save memory
Set `num_workers` > 0 to load data in parallel, improving performance.
DataLoader(dataset, batch_size=32, num_workers=4)
Split your dataset while keeping class distribution intact (e.g., using `train_test_split`).
from sklearn.model_selection import train_test_split X_train, X_val = train_test_split(data, stratify=labels)
Normalize inputs, augment only training data, shuffle during training, keep loaders fast and deterministic.
# Shuffle: True for train, False for val/test # Normalize and augment in transform pipeline
The training loop includes forward pass, loss calculation, backward pass, and optimizer step.
for epoch in range(num_epochs): for batch in dataloader: optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, targets) loss.backward() optimizer.step()
Pass input through the model to get predictions.
outputs = model(images)
Compare predictions with ground truth to calculate error.
loss = loss_fn(outputs, labels)
Compute gradients with backward() and update weights with optimizer.
loss.backward() optimizer.step()
Epoch = 1 pass over all training data. Iteration = 1 batch update.
for epoch in range(10): # 10 epochs ...
Check performance on unseen data during training to avoid overfitting.
model.eval() with torch.no_grad(): val_output = model(val_images)
Monitor train/val loss to detect these issues. Use regularization, dropout, or more data.
# Early stopping, dropout layers, or data augmentation help mitigate
Save models periodically during training.
torch.save(model.state_dict(), 'model_epoch3.pth')
Use accuracy, precision, recall, etc. to track performance.
correct += (predictions == labels).sum().item() accuracy = correct / total
Wrap loops with tqdm for visual training progress.
from tqdm import tqdm for batch in tqdm(dataloader): ...
Move model and tensors to GPU for faster computation.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) inputs, labels = inputs.to(device), labels.to(device)
Visualize losses, metrics, graphs, etc. during training.
from torch.utils.tensorboard import SummaryWriter writer = SummaryWriter() writer.add_scalar("Loss/train", loss, step)
Stop training if validation loss stops improving.
if val_loss > best_loss: patience_counter += 1 if patience_counter > patience: break
Trade-off between speed (large batches) and generalization (small batches).
# Try values: 32, 64, 128 DataLoader(..., batch_size=64)
Fix seeds and set deterministic options to reproduce results.
import torch import random import numpy as np torch.manual_seed(0) np.random.seed(0) random.seed(0) torch.backends.cudnn.deterministic = True
Convolutional Neural Networks (CNNs) are deep learning models designed to process grid-like data such as images.
# CNNs excel in visual tasks like image classification and object detection. # They use filters to extract spatial features.
Conv2d layers apply 2D convolutions over input images to extract features like edges and textures.
nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1) # Applies 16 filters on 3-channel input image
Pooling layers reduce spatial dimensions and help retain the most relevant information.
nn.MaxPool2d(kernel_size=2, stride=2) # Reduces size by selecting max value in 2x2 blocks
Normalizes activations in CNNs to stabilize and speed up training.
nn.BatchNorm2d(num_features=16) # Normalizes the 16 output channels from Conv2d
Dropout randomly disables neurons to prevent overfitting during training.
nn.Dropout(p=0.5) # 50% chance of zeroing each feature
Use PyTorch’s nn.Module to define a custom CNN.
class SimpleCNN(nn.Module): def __init__(self): super().__init__() self.conv = nn.Conv2d(3, 16, 3, 1) self.pool = nn.MaxPool2d(2, 2) self.fc = nn.Linear(16 * 16 * 16, 10) def forward(self, x): x = self.pool(F.relu(self.conv(x))) x = x.view(-1, 16 * 16 * 16) return self.fc(x)
Leverage powerful models like ResNet or VGG with pretrained weights from torchvision.
from torchvision.models import resnet18 model = resnet18(pretrained=True)
Fine-tune pretrained models on new tasks.
# Freeze pretrained layers for param in model.parameters(): param.requires_grad = False # Replace classifier model.fc = nn.Linear(512, num_classes)
Visualize intermediate outputs of CNNs to understand what the model sees.
# Forward hook example: def hook_fn(module, input, output): print(output.shape) model.layer1[0].register_forward_hook(hook_fn)
Use CNNs to assign labels to images.
# Output layer for classification nn.Linear(512, num_classes)
Detect objects and their locations in images using bounding boxes.
# Use YOLO, SSD or Faster-RCNN for detection # Output: (x, y, width, height, class)
Assign a class to each pixel using CNN-based architectures like U-Net.
# U-Net uses encoder-decoder structure # Final layer: 1x1 conv for per-pixel classification
Design unique networks suited to specific datasets and tasks.
# Mix of conv, pooling, dropout, batch norm # Tailored depth and width
Lightweight CNNs for mobile and embedded systems.
# Use depthwise separable convolutions # MobileNet: less computation, fewer parameters
Prevent overfitting using techniques like dropout, weight decay, and data augmentation.
# Weight decay in optimizer: optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)
RNNs are neural networks designed for sequential data like text, time series, and audio.
# Maintains hidden state to process sequences step-by-step # Good for tasks where order matters
PyTorch’s nn.RNN processes sequences with a basic recurrent layer.
rnn = nn.RNN(input_size=10, hidden_size=20, num_layers=1) output, hidden = rnn(input_seq)
Hidden states carry contextual information from previous steps.
# hidden.shape = (num_layers, batch_size, hidden_size)
LSTM (Long Short-Term Memory) improves RNNs by remembering long-term dependencies.
lstm = nn.LSTM(input_size=10, hidden_size=20) output, (hn, cn) = lstm(input_seq)
GRUs are like LSTMs but with fewer gates, making them faster and simpler.
gru = nn.GRU(input_size=10, hidden_size=20) output, hn = gru(input_seq)
Handle variable-length sequences with `pad_sequence` and `pack_padded_sequence`.
packed = pack_padded_sequence(padded_seq, lengths, batch_first=True) output, hidden = rnn(packed)
Convert word indices into dense vectors using nn.Embedding.
embedding = nn.Embedding(vocab_size, embedding_dim) embedded_seq = embedding(word_indices)
RNNs are used in translation, sentiment analysis, and more.
# Use embeddings → RNN → Fully Connected → Softmax
Train RNNs to predict the next character or word given previous input.
# Feed previous output as next input during generation loop
Attention allows the model to focus on relevant parts of the input sequence.
# Used in transformers and advanced seq2seq models
Encoder-decoder RNNs transform input sequences into output sequences (e.g., translation).
# Encoder encodes input → decoder generates output
Chain an encoder RNN with a decoder RNN for tasks like translation or summarization.
encoder_outputs, hidden = encoder(input_seq) output = decoder(target_seq, hidden)
torchtext simplifies text preprocessing, tokenization, and batching for NLP models.
from torchtext.legacy.data import Field, BucketIterator TEXT = Field(tokenize='spacy', include_lengths=True)
Process input in both forward and backward directions to capture more context.
rnn = nn.RNN(input_size, hidden_size, bidirectional=True)
RNNs are used in forecasting, classification, tagging, and generation.
# Example: Predict stock prices using past prices as sequence input
Transformers are deep learning architectures designed for handling sequential data using attention mechanisms, rather than RNNs.
# Basic PyTorch import import torch.nn as nn # Transformer model skeleton model = nn.Transformer()
Allows the model to weigh the importance of different words in a sequence relative to each other.
# Self-attention concept: each word attends to every other word # Built into nn.Transformer
Splits attention into multiple parts to capture diverse representations.
multi_head = nn.MultiheadAttention(embed_dim=512, num_heads=8)
Since transformers have no recurrence, positional encodings provide sequence order information.
# Use sinusoids or learnable embeddings # PyTorch example: pos_embedding = nn.Parameter(torch.zeros(1, seq_len, embed_dim))
High-level modules for building custom transformer architectures.
encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8) encoder = nn.TransformerEncoder(encoder_layer, num_layers=6)
Applies transformer models to image patches rather than text tokens.
# Vision Transformer requires patching and linear embedding # Use `timm` or HuggingFace models
Pretrained transformer models like BERT, GPT are available with ease of use.
from transformers import BertModel model = BertModel.from_pretrained('bert-base-uncased')
BERT is for encoding tasks, GPT for generation. Both supported by HuggingFace and PyTorch natively.
from transformers import GPT2Model gpt_model = GPT2Model.from_pretrained("gpt2")
Load pretrained models and train them on your specific task dataset.
# Add classification head and backpropagate loss outputs = model(input_ids) loss = loss_fn(outputs, labels) loss.backward()
Tokenizers convert text into tokens/ids, often with padding and truncation.
from transformers import BertTokenizer tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') inputs = tokenizer("Hello world!", return_tensors="pt")
Common NLP task using a transformer head for classification.
from transformers import BertForSequenceClassification model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
Use models trained on SQuAD-like datasets to answer questions from context.
from transformers import pipeline qa = pipeline("question-answering") qa(question="What is JSON?", context="JSON is a data format...")
Translate text from one language to another using pretrained models like Marian or T5.
from transformers import MarianMTModel, MarianTokenizer # Load English to French model tokenizer = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-fr") model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-en-fr")
Techniques include distillation, quantization, and efficient attention mechanisms like Longformer or FlashAttention.
# Use distilBERT for lightweight applications from transformers import DistilBertModel model = DistilBertModel.from_pretrained('distilbert-base-uncased')
Transformers offer parallelism, better long-range dependencies, but require more memory.
# RNN: sequential, slower # Transformer: parallel, faster with large data
TorchVision is a PyTorch library for vision datasets, models, and image transformations.
import torchvision # Datasets, models, transforms available in torchvision
Includes CIFAR-10, ImageNet, MNIST, FashionMNIST, etc.
from torchvision.datasets import CIFAR10 dataset = CIFAR10(root='./data', download=True)
Transform raw images into tensors and apply normalization, resizing, cropping, etc.
from torchvision import transforms transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor() ])
Load commonly used datasets with transforms and data loaders.
train_data = CIFAR10(root='./data', train=True, transform=transform)
Transforms are chainable functions for image augmentation or preparation.
transform = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.5], std=[0.5]) ])
Use random crop, horizontal flip, rotation, and color jitter to enhance training data.
transform = transforms.Compose([ transforms.RandomHorizontalFlip(), transforms.RandomCrop(32), transforms.ToTensor() ])
TorchVision provides pretrained ResNet, VGG, AlexNet, etc.
from torchvision.models import resnet18 model = resnet18(pretrained=True)
Replace the final layer of pretrained models and train on custom data.
model.fc = nn.Linear(model.fc.in_features, 10) # for 10-class output
Use models like DeepLabV3 or FCN for pixel-wise segmentation.
from torchvision.models.segmentation import fcn_resnet50 model = fcn_resnet50(pretrained=True)
Detect and localize objects in images with bounding boxes.
from torchvision.models.detection import fasterrcnn_resnet50_fpn model = fasterrcnn_resnet50_fpn(pretrained=True)
Access internal layers and visualize outputs to understand model behavior.
# Hook layer outputs for inspection activation = {} def hook_fn(module, input, output): activation['conv1'] = output model.conv1.register_forward_hook(hook_fn)
Load your own image datasets with ImageFolder or custom Dataset class.
from torchvision.datasets import ImageFolder dataset = ImageFolder(root='my_data', transform=transform)
Use torchvision.utils.save_image to write tensors to disk as image files.
from torchvision.utils import save_image save_image(tensor_image, 'output.png')
Use pretrained models to classify images from the 1000 ImageNet categories.
model = resnet18(pretrained=True) # Output is a 1000-dimensional tensor
torchvision.utils provides utilities like make_grid and save_image for batch visualization.
from torchvision.utils import make_grid grid = make_grid(batch_images)
TorchText is a PyTorch library used for NLP tasks including preprocessing, datasets, and vocab building.
# Import torchtext from torchtext.data.utils import get_tokenizer # Get tokenizer tokenizer = get_tokenizer("basic_english") # Tokenize sentence print(tokenizer("NLP with TorchText is fun!")) # ['nlp', 'with', 'torchtext', 'is', 'fun', '!']
Preprocessing includes lowercasing, removing punctuation, and tokenization.
import re def preprocess(text): text = text.lower() text = re.sub(r'[^\w\s]', '', text) return text print(preprocess("Hello, World!")) # hello world
Tokenization breaks text into words or subwords. TorchText offers multiple tokenizers.
tokenizer = get_tokenizer("basic_english") tokens = tokenizer("Deep learning with NLP") print(tokens) # ['deep', 'learning', 'with', 'nlp']
Build vocabulary and use pretrained embeddings like GloVe.
from torchtext.vocab import GloVe glove = GloVe(name='6B', dim=100) print(glove['language']) # Tensor vector
TorchText provides utilities for loading and creating datasets.
from torchtext.datasets import AG_NEWS train_iter = AG_NEWS(split='train') for label, line in train_iter: print(label, line) break
Use TorchText datasets for building sentiment and category classifiers.
# Classify AG_NEWS dataset using embeddings + linear model # Model includes: Embedding -> GlobalAvgPool -> Linear
Map words to dense vectors using torch.nn.Embedding or pretrained embeddings.
import torch.nn as nn embedding = nn.Embedding(num_embeddings=1000, embedding_dim=300) input = torch.LongTensor([1, 2, 3]) output = embedding(input)
Pad sequences to equal length for batching using `pad_sequence`.
from torch.nn.utils.rnn import pad_sequence seqs = [torch.tensor([1,2]), torch.tensor([3,4,5])] padded = pad_sequence(seqs, batch_first=True) print(padded)
Recurrent networks process sequences for tasks like sentiment analysis.
rnn = nn.LSTM(input_size=100, hidden_size=128, batch_first=True) input = torch.rand(4, 10, 100) output, _ = rnn(input)
Attention mechanisms allow the model to focus on important words.
# Attention weights used for context vector # Common in translation and summarization
Translate from one language to another using seq2seq or Transformer models.
# Encoder-Decoder with attention or transformer # Input: "je t’aime" → Output: "i love you"
Identify names, dates, places in text using sequence labeling.
# BiLSTM + CRF or Transformer for token classification # Output: "John lives in Canada" → [B-PER, O, O, B-LOC]
BERT provides contextual embeddings for classification, NER, QA.
from transformers import BertTokenizer, BertForTokenClassification tokenizer = BertTokenizer.from_pretrained("bert-base-cased") model = BertForTokenClassification.from_pretrained("bert-base-cased")
Save trained models with `torch.save()` and reload with `torch.load()`.
torch.save(model.state_dict(), 'nlp_model.pt') # Later... model.load_state_dict(torch.load('nlp_model.pt'))
Use accuracy, F1-score, BLEU (for translation), and ROUGE (for summarization).
from sklearn.metrics import f1_score f1_score(y_true, y_pred, average='macro')
Torchaudio is a PyTorch library for loading, transforming, and modeling audio data.
import torchaudio waveform, sample_rate = torchaudio.load('audio.wav') print(waveform.shape) # (channels, samples)
Use `torchaudio.load()` to load waveform and sample rate.
waveform, sr = torchaudio.load("speech.wav") print(sr) # 16000
Torchaudio provides transforms like Resample, Normalize, TimeStretch.
transform = torchaudio.transforms.Resample(orig_freq=48000, new_freq=16000) resampled = transform(waveform)
Convert waveforms to visual features using Spectrogram or MFCC.
spec = torchaudio.transforms.Spectrogram()(waveform) mfcc = torchaudio.transforms.MFCC()(waveform)
Add noise, time shift, or pitch changes to augment audio training data.
# Time masking mask = torchaudio.transforms.TimeMasking(time_mask_param=30) augmented = mask(spec)
Use custom datasets or built-ins like YESNO or SPEECHCOMMANDS.
from torchaudio.datasets import YESNO dataset = YESNO(root="./", download=True) print(len(dataset))
Classify sound type using CNN or RNN over MFCC/Spectrogram features.
# Input: MFCC features → CNN → Softmax → Class label (e.g., dog bark)
Convert audio waveforms into text using models trained on audio transcripts.
# Example: wav2vec2 or DeepSpeech models for STT (Speech-to-Text)
Use models like HuBERT, wav2vec2 from torchaudio.pipelines.
from torchaudio.pipelines import Wav2Vec2Model model = Wav2Vec2Model.from_pretrained("facebook/wav2vec2-base")
Use transfer learning by fine-tuning models on domain-specific audio datasets.
# Freeze base layers, retrain final classifier on new dataset for param in model.feature_extractor.parameters(): param.requires_grad = False
Split audio into meaningful chunks like words or phonemes.
# Use timestamps or VAD (Voice Activity Detection) to split audio
Create your own transformation by subclassing `nn.Module`.
class CustomVolume(nn.Module): def forward(self, waveform): return waveform * 0.5 # reduce volume by 50%
RNNs can model audio as sequences, suitable for classification or recognition.
rnn = nn.LSTM(input_size=40, hidden_size=128, num_layers=2, batch_first=True) input = torch.randn(8, 100, 40) output, _ = rnn(input)
Use accuracy, Word Error Rate (WER), and Signal-to-Noise Ratio (SNR) for evaluation.
from jiwer import wer truth = "hello world" hypothesis = "helo wurld" print(wer(truth, hypothesis)) # e.g., 0.5
Save model weights with PyTorch and export for inference.
torch.save(model.state_dict(), "audio_model.pt") # Load model.load_state_dict(torch.load("audio_model.pt"))
To move tensors or models to GPU, use `.cuda()` or `.to("cuda")`.
import torch x = torch.randn(2, 2) x = x.cuda() # OR x = x.to("cuda") # Moves tensor to GPU
Use `torch.cuda.is_available()` to check if a GPU is accessible.
if torch.cuda.is_available(): print("GPU available") else: print("Using CPU")
Move both input and model to GPU for computation.
model = model.to("cuda") inputs = inputs.to("cuda") outputs = model(inputs)
Use `num_workers` and `pin_memory=True` in DataLoader to speed up GPU training.
train_loader = torch.utils.data.DataLoader( dataset, batch_size=64, shuffle=True, num_workers=4, pin_memory=True )
Wrap model with `DataParallel` for simple multi-GPU usage.
model = torch.nn.DataParallel(model) model = model.cuda()
This parallelizes model across available GPUs but is slower than DDP.
# Automatically splits and gathers inputs across GPUs outputs = model(inputs)
DDP is faster and more efficient for multi-GPU training across multiple processes.
# Requires torch.distributed and multiple scripts/processes # Use torchrun or mp.spawn to launch DDP training
Use `torch.cuda.amp` for faster training and reduced memory usage.
from torch.cuda.amp import autocast, GradScaler scaler = GradScaler() for inputs, targets in dataloader: optimizer.zero_grad() with autocast(): outputs = model(inputs) loss = criterion(outputs, targets) scaler.scale(loss).backward() scaler.step(optimizer) scaler.update()
NVIDIA Apex provides older mixed precision training tools (less used now).
# apex.amp is deprecated in favor of torch.cuda.amp # Install: pip install -v --no-cache-dir ./apex
Use `torch.cuda.empty_cache()` and monitor usage with `torch.cuda.memory_allocated()`.
torch.cuda.empty_cache() # Frees unused memory print(torch.cuda.memory_allocated() / 1e6, "MB")
Use `nvidia-smi` in the terminal or libraries like `gpustat`.
# Terminal: nvidia-smi # Python: !pip install gpustat !gpustat
CUDA errors are often delayed — use `CUDA_LAUNCH_BLOCKING=1` to debug.
# Run your script with this env var CUDA_LAUNCH_BLOCKING=1 python train.py
Use gradient checkpointing, mixed precision, and model parallelism for large models.
# Gradient checkpointing: reduce memory usage from torch.utils.checkpoint import checkpoint output = checkpoint(model_block, input)
Use `torch.profiler` to profile GPU usage, memory, and latency.
import torch.profiler as profiler with profiler.profile(...): # Add your config outputs = model(inputs)
Use `torch.save(model.state_dict())` — safe for CPU and GPU.
torch.save(model.state_dict(), "model.pth") # To load: model.load_state_dict(torch.load("model.pth")) model.to("cuda")
Use `torch.save()` to store model weights or full models.
torch.save(model.state_dict(), 'model.pth')
Load saved weights and reinitialize the model before inference.
model.load_state_dict(torch.load('model.pth')) model.eval()
Script mode captures full model logic; trace mode records a single forward pass.
scripted = torch.jit.script(model) traced = torch.jit.trace(model, sample_input)
TorchScript allows saving models in a serialized form for deployment.
torch.jit.save(scripted, 'scripted_model.pt')
Export models to ONNX format for cross-platform inference.
torch.onnx.export(model, input_tensor, "model.onnx")
Use `onnxruntime` for fast inference in production environments.
import onnxruntime as ort sess = ort.InferenceSession("model.onnx") outputs = sess.run(None, {"input": input_array})
Convert models with TorchScript and deploy on iOS or Android apps.
# Save for mobile scripted_model = torch.jit.script(model) scripted_model.save("mobile_model.pt")
Deploy models using Flask or FastAPI with JSON API endpoints.
@app.post("/predict/") def predict(data: Input): tensor = torch.tensor(data.features) result = model(tensor) return result.tolist()
Use TorchServe to serve models at scale with REST endpoints and monitoring.
# Define handler + model archive (.mar) torch-model-archiver --model-name=my_model ... torchserve --start --model-store model_store ...
Use EC2/GCP VM + Docker, or AWS SageMaker for scalable deployment.
# On AWS SageMaker: # 1. Package model + inference.py # 2. Create PyTorchModel() # 3. Deploy with model.deploy()
Convert ONNX to CoreML for native iOS inference using coremltools.
import coremltools as ct coreml_model = ct.converters.onnx.convert("model.onnx") coreml_model.save("Model.mlmodel")
TensorRT speeds up inference by optimizing ONNX models for NVIDIA GPUs.
# Use ONNX + TensorRT or torch2trt # Example: torch2trt(model, [input])
Use ONNX.js or TensorFlow.js to deploy converted models in-browser.
# Convert to ONNX # Load with ONNX.js in JavaScript frontend
Use tools like Prometheus, Sentry, or TorchServe logs to track latency, throughput.
# TorchServe provides built-in logs and Prometheus metrics # You can also log API latency per request manually
Use Git, DVC, or TorchServe snapshots to manage different model versions.
# DVC: dvc add model.pth git commit model.pth.dvc # TorchServe: model_name.mar → versioned in model-store/
PyTorch Lightning is a lightweight PyTorch wrapper that simplifies training, scaling, and readability without sacrificing flexibility.
# Lightning separates engineering from research # Improves reproducibility and model structure
Lightning abstracts training loops, logging, checkpointing, and device placement.
# You define only: # - forward # - training_step # - validation_step # - configure_optimizers
# Install from pip pip install lightning
Encapsulates all the logic for training/validation/testing in one module.
import lightning as L class MyModel(L.LightningModule): def __init__(self): super().__init__() self.layer = torch.nn.Linear(28*28, 10) def forward(self, x): return self.layer(x) def training_step(self, batch, batch_idx): x, y = batch y_hat = self(x) return F.cross_entropy(y_hat, y) def configure_optimizers(self): return torch.optim.Adam(self.parameters())
The `Trainer` class runs your entire training loop, supporting GPUs, TPUs, loggers, and callbacks.
trainer = L.Trainer(max_epochs=5, accelerator="gpu", devices=1) trainer.fit(model, train_loader, val_loader)
# Automatically log metrics using log() self.log("train_loss", loss) # Use callbacks for custom hooks like Learning Rate Monitor or custom schedulers
Save the best model automatically during training.
from lightning.pytorch.callbacks import ModelCheckpoint checkpoint_cb = ModelCheckpoint(monitor="val_loss") trainer = L.Trainer(callbacks=[checkpoint_cb])
Trainer scales across hardware with no code change.
# Run on 8 GPUs trainer = L.Trainer(devices=8, accelerator="gpu", strategy="ddp")
Encapsulate data loading, splitting, and transformation logic.
class MNISTDataModule(L.LightningDataModule): def prepare_data(self): ... def setup(self, stage=None): ... def train_dataloader(self): ... def val_dataloader(self): ...
Stop training when validation loss stops improving.
from lightning.pytorch.callbacks import EarlyStopping early_stop = EarlyStopping(monitor="val_loss", patience=3) trainer = L.Trainer(callbacks=[early_stop])
from lightning.pytorch.loggers import TensorBoardLogger logger = TensorBoardLogger("tb_logs", name="my_model") trainer = L.Trainer(logger=logger)
Integrate with Optuna or RayTune for hyperparameter sweeps.
# Use LightningCLI or Hydra for structured tuning trainer = L.Trainer(auto_lr_find=True)
Scale to multiple nodes or GPUs using DDP or FSDP.
trainer = L.Trainer(accelerator="gpu", strategy="ddp", devices=4)
Use `torch.jit`, export scripts, ONNX, and checkpoints for deployment.
scripted_model = torch.jit.script(model) scripted_model.save("model.pt")
Lightning removes boilerplate but uses the same PyTorch code under the hood.
# Lightning == PyTorch + Engineering Abstraction # You can still use native PyTorch anywhere
Reinforcement Learning is a learning paradigm where agents learn by interacting with environments to maximize rewards.
MDP defines state, action, transition probability, and reward structure for an RL problem.
# (S, A, P, R, γ) # S = states, A = actions # P = state transitions # R = reward function # γ = discount factor
Q-learning uses value iteration to estimate optimal policy without a model of the environment.
Q[state][action] = reward + gamma * max(Q[next_state])
Use neural networks to approximate Q-values for high-dimensional state spaces.
q_network = nn.Sequential( nn.Linear(state_dim, 128), nn.ReLU(), nn.Linear(128, action_dim) )
Directly optimize the policy using gradients of expected reward.
# Loss = -log(prob) * reward loss = -torch.log(prob[action]) * reward
A2C uses a separate value network to stabilize training.
Advantage = reward - value(state)
PPO clips policy update ratio to prevent large updates.
ratio = new_prob / old_prob loss = min(ratio * adv, clip(ratio, 1-eps, 1+eps) * adv)
import gym env = gym.make("CartPole-v1") state = env.reset()
Define new Gym environments by subclassing `gym.Env`.
class MyEnv(gym.Env): def __init__(self): ... def step(self, action): ... def reset(self): ... def render(self): ...
Modify the reward function to encourage better learning behavior.
# Example: penalty for time or energy use reward = base_reward - penalty
Use ε-greedy, Boltzmann exploration, or noise to explore action space.
if random() < epsilon: action = random_action() else: action = best_action()
# Save model torch.save(model.state_dict(), "agent.pth") # Load model model.load_state_dict(torch.load("agent.pth"))
Use target networks, experience replay, normalization, or learning rate schedules.
# DQN Target Network: target_net.load_state_dict(q_net.state_dict())
Plot total reward per episode or use gym's render method.
import matplotlib.pyplot as plt plt.plot(rewards) plt.xlabel("Episode") plt.ylabel("Reward")
Train multiple agents in shared or competing environments with self-play, MADDPG, or PPO.
# Use PettingZoo or multi-agent Gym extensions # Each agent may have separate networks or shared policies
Generative Adversarial Networks consist of a generator and a discriminator competing to improve generation quality.
import torch import torch.nn as nn class Generator(nn.Module): def __init__(self): super().__init__() self.net = nn.Sequential( nn.Linear(100, 256), nn.ReLU(), nn.Linear(256, 784), nn.Tanh() ) def forward(self, x): return self.net(x) # Generator creates fake images from noise noise = torch.randn(16, 100) # batch of 16 noise vectors gen = Generator() fake_images = gen(noise) # generated images shaped (16, 784)
Attention lets models focus on important parts of input; key in Transformers.
import torch.nn.functional as F def scaled_dot_product_attention(query, key, value): scores = torch.matmul(query, key.transpose(-2, -1)) / (query.size(-1)**0.5) weights = F.softmax(scores, dim=-1) output = torch.matmul(weights, value) return output, weights
Capsules encode spatial hierarchies; better at preserving pose information.
# Pseudo-code for capsule layer routing def routing(u_hat, num_iterations=3): b = torch.zeros(...) for i in range(num_iterations): c = softmax(b) s = (c * u_hat).sum(dim=1) v = squash(s) b += (u_hat * v).sum(dim=-1) return v
Train models to generalize from very few examples using meta-learning techniques.
# Example: Use pretrained embeddings with nearest neighbor classifier def classify(query, support_set, support_labels): distances = torch.cdist(query, support_set) nearest = distances.argmin(dim=1) return support_labels[nearest]
Learn representations by pulling similar examples closer and pushing dissimilar apart.
# Simple contrastive loss example def contrastive_loss(z_i, z_j, temperature=0.5): sim = torch.matmul(z_i, z_j.T) / temperature labels = torch.arange(len(z_i)) return nn.CrossEntropyLoss()(sim, labels)
Learn a distance metric to cluster similar data points together in embedding space.
# Triplet loss example import torch.nn.functional as F def triplet_loss(anchor, positive, negative, margin=1.0): pos_dist = F.pairwise_distance(anchor, positive) neg_dist = F.pairwise_distance(anchor, negative) loss = F.relu(pos_dist - neg_dist + margin) return loss.mean()
Zero-shot: Generalize to unseen classes without training; Self-supervised: Learn from unlabeled data.
# Zero-shot: Use pretrained CLIP for text-image similarity # Self-supervised: Predict missing parts or augmentations
A convolutional model family scaling width, depth, and resolution efficiently.
from torchvision.models import efficientnet_b0 model = efficientnet_b0(pretrained=True) # Fine-tune on custom dataset
Automate design of neural network architectures using search algorithms.
# Example: Use NAS libraries to explore model variants # Pseudo-code: Define search space, evaluate candidates, select best
Reduce model size and increase inference speed by using lower precision (e.g., 8-bit).
import torch.quantization model = ... # define model model.qconfig = torch.quantization.get_default_qconfig('fbgemm') torch.quantization.prepare(model, inplace=True) # Calibrate on data torch.quantization.convert(model, inplace=True)
Remove less important weights to reduce size and improve speed without much accuracy loss.
import torch.nn.utils.prune as prune prune.l1_unstructured(model.layer1, name='weight', amount=0.4) # 40% weights pruned in layer1
Train smaller "student" models to mimic large "teacher" models' outputs.
# Pseudo-code for distillation loss loss = alpha * student_loss + (1 - alpha) * distillation_loss(student_logits, teacher_logits)
Combine content of one image with style of another using deep features.
# Use pretrained VGG to compute content & style loss # Optimize input image to minimize combined loss
Generate text descriptions for images using CNN + RNN or Transformer models.
# CNN encodes image features # RNN generates word sequences conditioned on features
Combine data from multiple modalities, like images and text, into joint models.
# Example: Fuse image and text embeddings for classification
Build a CNN to classify images into categories using PyTorch.
import torch import torch.nn as nn import torch.optim as optim class SimpleCNN(nn.Module): def __init__(self): super().__init__() self.conv = nn.Conv2d(3, 16, 3) self.pool = nn.MaxPool2d(2, 2) self.fc = nn.Linear(16 * 30 * 30, 10) # for 64x64 input def forward(self, x): x = self.pool(torch.relu(self.conv(x))) x = x.view(-1, 16 * 30 * 30) x = self.fc(x) return x
Use RNN or Transformer to classify sentiment from text data.
# Tokenize text, embed, feed into LSTM or Transformer encoder # Final layer: sigmoid for binary sentiment
Train a model to recognize spoken commands using audio features like MFCCs.
# Extract MFCC features from audio # Feed to CNN or RNN classifier
Build a model to classify chest X-ray images as pneumonia or normal.
# Use pretrained CNN (e.g. ResNet) # Fine-tune last layers on X-ray dataset
Detect fake news articles using NLP techniques and text classification.
# Preprocess text, vectorize # Train classifier (e.g., Transformer or LSTM)
Build a sequence-to-sequence model to translate between languages.
# Use encoder-decoder architecture with attention
Implement neural style transfer in an interactive app.
# Use pretrained VGG for style/content extraction # Optimize input image iteratively
Detect and localize objects in images using models like YOLO or Faster R-CNN.
# Use pretrained detection model # Fine-tune on custom dataset if needed
Build conversational AI using transformer models for dialogue generation.
# Use pretrained GPT or custom seq2seq transformer # Fine-tune on conversational datasets
Predict future values in time series data with RNNs or Temporal Convolutional Networks.
# Prepare sequences of historical data as input # Train model to predict next step(s)
Classify music audio clips into genres using CNNs on spectrograms.
# Convert audio to spectrogram images # Train CNN classifier
Recognize emotions from facial expressions or speech in real time.
# Use CNN or RNN on video frames or audio # Output emotion class probabilities
Extract text from images using CNN+RNN or Transformer OCR models.
# Detect text regions with CNN # Recognize characters using sequence models
Build a voice assistant using speech recognition, NLP, and TTS components.
# Integrate speech-to-text, intent recognition, and text-to-speech
Deploy models with web dashboard to monitor predictions, usage, and performance.
# Use Flask/FastAPI backend serving PyTorch models # Frontend dashboard for metrics and inputs