jQuery


Beginners To Experts


The site is under development.

OpenCV Tutorial

What is OpenCV?
OpenCV (Open Source Computer Vision Library) is a popular open-source toolkit for real-time computer vision and image processing. It supports tasks like object tracking, face detection, and image transformations. Written in C++, it has Python bindings and is widely used in AI, robotics, medical imaging, and surveillance due to its speed and flexibility.

# Import the OpenCV library
import cv2

# Print the OpenCV version to confirm installation
print(cv2.__version__)  # Example output: 4.8.0
      

History and Evolution of OpenCV
OpenCV began as an Intel Research project in 1999 and was made open source in 2000. Its purpose was to accelerate CPU-based computer vision. Over time, it gained support for Python, GPU acceleration, mobile deployment, and AI features. The current OpenCV 4.x release is more modular and includes a DNN module for deep learning integration.

# Print history information
print("OpenCV started in 1999 at Intel and became open-source in 2000.")
      

Installing OpenCV in Python
Installing OpenCV is done through pip. The basic version is `opencv-python`, and for extra modules like text detection or SIFT, use `opencv-contrib-python`. These can be installed in any environment and work on all platforms. Use a terminal, command prompt, or within IDE terminals.

# Basic OpenCV installation
# pip install opencv-python

# For full functionality including contrib modules
# pip install opencv-contrib-python
      

Setting Up Development Environment
To begin with OpenCV, use IDEs like PyCharm, VSCode, or Jupyter. It's best to create a virtual environment to keep dependencies clean. Make sure OpenCV is installed inside this environment. This setup allows easy debugging, auto-imports, and running OpenCV functions.

# Optional: Create a virtual environment (Linux/macOS)
# python3 -m venv opencv_env
# source opencv_env/bin/activate

# Windows version
# python -m venv opencv_env
# .\opencv_env\Scripts\activate

# Then install OpenCV
# pip install opencv-python
      

OpenCV vs. Other Vision Libraries
OpenCV is compared with PIL, scikit-image, and TensorFlow. While PIL is good for basic image tasks, OpenCV handles advanced vision and real-time processing. Scikit-image is simpler but lacks speed. TensorFlow is ideal for deep learning, while OpenCV is excellent for traditional vision algorithms with increasing DNN support.

# Compare OpenCV and PIL image loading
import cv2
from PIL import Image

# Using OpenCV to read image
img_cv2 = cv2.imread("image.jpg")

# Using PIL to open image
img_pil = Image.open("image.jpg")
      

First OpenCV Program – Read and Display Image
The first step in OpenCV is reading and displaying an image. Use `cv2.imread()` to load the image and `cv2.imshow()` to show it. `cv2.waitKey(0)` waits for a keypress, and `cv2.destroyAllWindows()` closes the display window. This confirms OpenCV is installed and working correctly.

# Import OpenCV
import cv2

# Load an image (make sure sample.jpg exists)
img = cv2.imread("sample.jpg")

# Show image in a window
cv2.imshow("My First OpenCV Window", img)

# Wait for any key press
cv2.waitKey(0)

# Close all OpenCV windows
cv2.destroyAllWindows()
      

Reading Images (cv2.imread)
The `cv2.imread()` function reads an image from a file into a NumPy array. It accepts a second argument to specify the mode: `0` for grayscale, `1` for color, and `-1` for unchanged (including alpha channel). This is the first step in image processing workflows.

import cv2

# Read color image
img = cv2.imread("photo.jpg", 1)  # 1 = color

# Display the image
cv2.imshow("Color Image", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Writing Images (cv2.imwrite)
Use `cv2.imwrite()` to save an image to disk. This is useful after modifying an image. It takes the filename and the image object as arguments. The format is inferred from the filename extension.

import cv2

# Read the image
img = cv2.imread("photo.jpg")

# Save it as a new file
cv2.imwrite("copy_photo.png", img)  # Saves as PNG
      

Image Properties (Shape, Size, Channels)
Image properties such as shape, size, and number of channels can be accessed using NumPy functions. `.shape` gives (height, width, channels), `.size` returns total pixels, and `.dtype` shows the data type.

import cv2

img = cv2.imread("photo.jpg")

print("Shape:", img.shape)      # (height, width, channels)
print("Size:", img.size)        # Total number of pixels
print("Data type:", img.dtype)  # Type of each pixel
      

Image Data Types and Conversion
OpenCV images are NumPy arrays with data types like `uint8`. You can convert them using `astype()` or `cv2.cvtColor()` for changing color spaces. For example, converting to grayscale.

import cv2

# Convert color image to grayscale
img = cv2.imread("photo.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

cv2.imshow("Gray Image", gray)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

ROI (Region of Interest)
ROI is a selected portion of an image. You can extract it using array slicing and use it for operations like cropping, copying, or filtering that region.

import cv2

img = cv2.imread("photo.jpg")

# Define ROI coordinates: y1:y2, x1:x2
roi = img[100:200, 150:250]  # Crop region

# Show the cropped part
cv2.imshow("ROI", roi)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Image Resizing and Scaling
Use `cv2.resize()` to change the image size. You can specify the target dimensions or a scale factor. It’s useful for normalization, previews, and preprocessing before analysis.

import cv2

img = cv2.imread("photo.jpg")

# Resize to 300x300
resized = cv2.resize(img, (300, 300))

cv2.imshow("Resized", resized)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Drawing Lines and Circles
OpenCV provides `cv2.line()` to draw straight lines and `cv2.circle()` for circles. You can set start and end points, color (B, G, R), and thickness. Useful for marking or annotation.

import cv2
import numpy as np

img = np.zeros((400, 400, 3), dtype=np.uint8)

# Draw a blue line
cv2.line(img, (50, 50), (350, 50), (255, 0, 0), 3)

# Draw a green circle
cv2.circle(img, (200, 200), 50, (0, 255, 0), 2)

cv2.imshow("Shapes", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Drawing Rectangles and Ellipses
`cv2.rectangle()` is used for bounding boxes and layout markers, while `cv2.ellipse()` helps draw arcs or full ellipses. Both accept coordinates, color, and thickness.

import cv2
import numpy as np

img = np.ones((400, 400, 3), dtype=np.uint8) * 255

# Draw rectangle
cv2.rectangle(img, (100, 100), (300, 200), (0, 0, 255), 2)

# Draw ellipse
cv2.ellipse(img, (200, 300), (100, 50), 0, 0, 360, (0, 100, 200), 2)

cv2.imshow("Shapes", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Drawing Polygons
`cv2.polylines()` allows drawing of open or closed polygons using a list of points. You can define whether the shape is closed and the thickness/color.

import cv2
import numpy as np

img = np.ones((400, 400, 3), dtype=np.uint8) * 255

# Define polygon points
pts = np.array([[50, 300], [100, 200], [200, 250], [300, 300]], np.int32)
pts = pts.reshape((-1, 1, 2))

# Draw closed polygon
cv2.polylines(img, [pts], True, (255, 0, 255), 2)

cv2.imshow("Polygon", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Adding Text to Images
`cv2.putText()` overlays text on an image. You specify the text, position, font type, scale, color, and thickness. Useful for labeling, debug info, or UI overlays.

import cv2
import numpy as np

img = np.zeros((300, 600, 3), dtype=np.uint8)

# Put text
cv2.putText(img, "OpenCV Rocks!", (50, 150), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (255, 255, 0), 2)

cv2.imshow("Text", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Custom Color and Thickness
You can define any BGR color tuple in OpenCV and control line thickness. Thickness of -1 fills shapes completely (useful for filled rectangles, circles, etc.).

import cv2
import numpy as np

img = np.ones((300, 300, 3), dtype=np.uint8) * 255

# Filled red rectangle
cv2.rectangle(img, (50, 50), (250, 150), (0, 0, 255), -1)

cv2.imshow("Filled", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Mouse Callback Functions for Drawing
OpenCV allows mouse interaction using `cv2.setMouseCallback()`. You can create functions that draw lines, rectangles, or capture coordinates based on mouse events like click, drag, or release.

import cv2
import numpy as np

drawing = False
ix, iy = -1, -1
img = np.ones((400, 400, 3), dtype=np.uint8) * 255

def draw_circle(event, x, y, flags, param):
    global ix, iy, drawing
    if event == cv2.EVENT_LBUTTONDOWN:
        drawing = True
        ix, iy = x, y
    elif event == cv2.EVENT_MOUSEMOVE and drawing:
        cv2.circle(img, (x, y), 5, (255, 0, 0), -1)
    elif event == cv2.EVENT_LBUTTONUP:
        drawing = False

cv2.namedWindow("Draw")
cv2.setMouseCallback("Draw", draw_circle)

while True:
    cv2.imshow("Draw", img)
    if cv2.waitKey(1) & 0xFF == 27:
        break

cv2.destroyAllWindows()
      

Image Translation
Image translation moves the entire image along the X and Y axis by shifting pixels. This operation is done using affine transformation matrices. It's useful in object tracking and image registration.

import cv2
import numpy as np

img = cv2.imread("photo.jpg")

# Define translation matrix: move 100 right, 50 down
M = np.float32([[1, 0, 100], [0, 1, 50]])
translated = cv2.warpAffine(img, M, (img.shape[1], img.shape[0]))

cv2.imshow("Translated", translated)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Image Rotation
Rotation pivots the image around a specified point, usually the center. The rotation matrix is created by `cv2.getRotationMatrix2D()`. Angles are in degrees, positive values rotate counterclockwise.

import cv2

img = cv2.imread("photo.jpg")

(h, w) = img.shape[:2]
center = (w // 2, h // 2)

# Rotate by 45 degrees around center
M = cv2.getRotationMatrix2D(center, 45, 1.0)
rotated = cv2.warpAffine(img, M, (w, h))

cv2.imshow("Rotated", rotated)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Image Scaling
Scaling changes the size of an image by a scale factor or to fixed dimensions using `cv2.resize()`. This is essential for normalization and preparing inputs for models.

import cv2

img = cv2.imread("photo.jpg")

# Scale image by 50%
scaled = cv2.resize(img, None, fx=0.5, fy=0.5)

cv2.imshow("Scaled", scaled)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Image Flipping
Flipping mirrors the image horizontally, vertically, or both using `cv2.flip()`. It’s useful for data augmentation and image correction.

import cv2

img = cv2.imread("photo.jpg")

# Flip horizontally (flipCode=1)
flipped = cv2.flip(img, 1)

cv2.imshow("Flipped", flipped)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Image Cropping
Cropping extracts a sub-region (ROI) from an image using NumPy slicing, which allows selective processing or focusing.

import cv2

img = cv2.imread("photo.jpg")

# Crop rectangle: y:100-300, x:150-350
cropped = img[100:300, 150:350]

cv2.imshow("Cropped", cropped)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Affine and Perspective Transformations
Affine transforms preserve parallelism (rotation, translation, scaling), while perspective transforms mimic 3D effects, changing the viewpoint using a 3x3 matrix.

import cv2
import numpy as np

img = cv2.imread("photo.jpg")

# Affine transform points
pts1 = np.float32([[50,50], [200,50], [50,200]])
pts2 = np.float32([[10,100], [200,50], [100,250]])

M = cv2.getAffineTransform(pts1, pts2)
affine = cv2.warpAffine(img, M, (img.shape[1], img.shape[0]))

cv2.imshow("Affine", affine)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Introduction to Color Models
Color models define how colors are represented numerically. OpenCV uses BGR by default, but others like HSV, LAB, and grayscale are important for different applications such as segmentation or lighting correction.

# BGR is OpenCV default color space
# Different models suit different tasks like HSV for color filtering
      

BGR to Grayscale
Convert BGR images to grayscale to reduce complexity and focus on intensity using `cv2.cvtColor()` with `cv2.COLOR_BGR2GRAY`.

import cv2

img = cv2.imread("photo.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

cv2.imshow("Grayscale", gray)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

BGR to HSV
HSV separates color (hue), saturation, and brightness, making it easier for color-based segmentation and filtering.

import cv2

img = cv2.imread("photo.jpg")
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

cv2.imshow("HSV", hsv)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

BGR to LAB
LAB color space represents color in a way closer to human vision and is used for color correction and enhancement.

import cv2

img = cv2.imread("photo.jpg")
lab = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)

cv2.imshow("LAB", lab)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Channel Splitting and Merging
Split color channels into separate arrays for individual processing and merge them back when done using `cv2.split()` and `cv2.merge()`.

import cv2

img = cv2.imread("photo.jpg")
b, g, r = cv2.split(img)

# Merge channels back
merged = cv2.merge([b, g, r])
      

Color Filtering and Thresholding
Filter images by color ranges in HSV or other color spaces to segment objects based on color using `cv2.inRange()`.

import cv2
import numpy as np

img = cv2.imread("photo.jpg")
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

# Define lower and upper bounds for blue color
lower_blue = np.array([100, 150, 0])
upper_blue = np.array([140, 255, 255])

mask = cv2.inRange(hsv, lower_blue, upper_blue)
result = cv2.bitwise_and(img, img, mask=mask)

cv2.imshow("Filtered Blue", result)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Global Thresholding
Global thresholding applies a fixed threshold to convert grayscale images to binary. Pixels above the threshold become white, below become black.

import cv2

img = cv2.imread("photo.jpg", 0)  # Grayscale

_, thresh = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)

cv2.imshow("Global Threshold", thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Adaptive Thresholding
Adaptive thresholding computes thresholds for small regions, adapting to lighting variations, making it good for uneven illumination.

import cv2

img = cv2.imread("photo.jpg", 0)

adaptive = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                                 cv2.THRESH_BINARY, 11, 2)

cv2.imshow("Adaptive Threshold", adaptive)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Otsu’s Binarization
Otsu’s method automatically finds the optimal threshold to separate foreground and background.

import cv2

img = cv2.imread("photo.jpg", 0)

_, otsu = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

cv2.imshow("Otsu Threshold", otsu)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Inverted Thresholding
Inverts the binary image so that pixels above the threshold become black, and below become white.

import cv2

img = cv2.imread("photo.jpg", 0)

_, inv_thresh = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY_INV)

cv2.imshow("Inverted Threshold", inv_thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Trunc and ToZero Thresholds
Trunc threshold caps pixel values at the threshold, while ToZero sets pixels below the threshold to zero.

import cv2

img = cv2.imread("photo.jpg", 0)

_, trunc = cv2.threshold(img, 127, 255, cv2.THRESH_TRUNC)
_, tozero = cv2.threshold(img, 127, 255, cv2.THRESH_TOZERO)

cv2.imshow("Trunc Threshold", trunc)
cv2.imshow("ToZero Threshold", tozero)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Visual Comparison of Thresholding Methods
Comparing different thresholding methods side by side helps in selecting the right one depending on image and lighting conditions.

# Typically done by stacking thresholded images and showing them
# (Implementation is similar to above, combining outputs)
      

Averaging Filters
Averaging filters smooth images by replacing each pixel with the average of its neighbors, reducing noise but can blur edges.

import cv2

img = cv2.imread("photo.jpg")

# Apply averaging filter with 5x5 kernel
blur = cv2.blur(img, (5, 5))

cv2.imshow("Averaging", blur)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Gaussian Blur
Gaussian blur uses a weighted average where pixels closer to center have more influence, preserving edges better than averaging.

import cv2

img = cv2.imread("photo.jpg")

# Apply Gaussian blur with 5x5 kernel
gauss = cv2.GaussianBlur(img, (5, 5), 0)

cv2.imshow("Gaussian Blur", gauss)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Median Blur
Median blur replaces each pixel by the median of its neighbors, effectively removing salt-and-pepper noise.

import cv2

img = cv2.imread("photo.jpg")

# Apply median blur with kernel size 5
median = cv2.medianBlur(img, 5)

cv2.imshow("Median Blur", median)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Bilateral Filtering
Bilateral filter smooths images while preserving edges by considering both spatial closeness and color similarity.

import cv2

img = cv2.imread("photo.jpg")

# Apply bilateral filter
bilateral = cv2.bilateralFilter(img, 9, 75, 75)

cv2.imshow("Bilateral Filter", bilateral)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Sharpening Techniques
Sharpening enhances edges by emphasizing pixel intensity differences, often done via convolution with specific kernels.

import cv2
import numpy as np

img = cv2.imread("photo.jpg")

# Kernel for sharpening
kernel = np.array([[0, -1, 0],
                   [-1, 5,-1],
                   [0, -1, 0]])

sharpened = cv2.filter2D(img, -1, kernel)

cv2.imshow("Sharpened", sharpened)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Noise Reduction Techniques
Noise reduction removes unwanted variations in pixel values using filters like Gaussian, median, or bilateral, improving image quality.

# Noise reduction often combines blurring and filtering
# Use above blurring functions depending on noise type
      

Erosion and Dilation
Erosion removes pixels on object boundaries, shrinking the foreground. Dilation adds pixels to the boundaries, expanding objects. They are fundamental for noise removal and object separation in binary images.

import cv2
import numpy as np

img = cv2.imread("binary_image.png", 0)

kernel = np.ones((5,5), np.uint8)

# Erode image
erosion = cv2.erode(img, kernel, iterations=1)

# Dilate image
dilation = cv2.dilate(img, kernel, iterations=1)

cv2.imshow("Erosion", erosion)
cv2.imshow("Dilation", dilation)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Opening and Closing
Opening is erosion followed by dilation, useful for removing small noise. Closing is dilation followed by erosion, used to close small holes inside objects.

import cv2
import numpy as np

img = cv2.imread("binary_image.png", 0)
kernel = np.ones((5,5), np.uint8)

# Opening removes noise
opening = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)

# Closing fills holes
closing = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)

cv2.imshow("Opening", opening)
cv2.imshow("Closing", closing)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Morphological Gradient
The morphological gradient is the difference between dilation and erosion of an image. It highlights the edges of objects.

gradient = cv2.morphologyEx(img, cv2.MORPH_GRADIENT, kernel)

cv2.imshow("Morphological Gradient", gradient)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Top Hat and Black Hat
Top Hat extracts small elements and details brighter than their surroundings. Black Hat reveals small dark regions on a bright background.

tophat = cv2.morphologyEx(img, cv2.MORPH_TOPHAT, kernel)
blackhat = cv2.morphologyEx(img, cv2.MORPH_BLACKHAT, kernel)

cv2.imshow("Top Hat", tophat)
cv2.imshow("Black Hat", blackhat)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Structuring Elements
Structuring elements define the neighborhood used for morphological operations. Common shapes include rectangles, ellipses, and crosses.

rect_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
ellipse_kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5))
cross_kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (5,5))
      

Practical Use Cases
Morphological transformations are used in noise removal, shape analysis, image pre-processing for OCR, and extracting meaningful structures in images.

# Example: Clean noisy text regions before OCR
cleaned = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
      

Introduction to Edges
Edges represent boundaries where pixel intensity changes sharply, indicating object outlines or texture changes. Detecting edges is crucial for object recognition, segmentation, and tracking in computer vision.

Sobel Operators
Sobel operators compute the gradient of the image intensity in horizontal and vertical directions. This helps highlight edges by emphasizing intensity changes using convolution kernels.

import cv2

img = cv2.imread("photo.jpg", 0)  # Load grayscale

# Sobel edge detection in X and Y directions
sobelx = cv2.Sobel(img, cv2.CV_64F, 1, 0, ksize=3)
sobely = cv2.Sobel(img, cv2.CV_64F, 0, 1, ksize=3)

cv2.imshow("Sobel X", sobelx)
cv2.imshow("Sobel Y", sobely)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Scharr Operators
Scharr operators are an improved version of Sobel, designed to give better rotational symmetry and reduce noise when detecting edges.

import cv2

img = cv2.imread("photo.jpg", 0)

scharrx = cv2.Scharr(img, cv2.CV_64F, 1, 0)
scharry = cv2.Scharr(img, cv2.CV_64F, 0, 1)

cv2.imshow("Scharr X", scharrx)
cv2.imshow("Scharr Y", scharry)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Laplacian of Image
The Laplacian operator detects edges by calculating the second derivative of the image intensity, capturing areas where intensity changes abruptly in all directions.

import cv2

img = cv2.imread("photo.jpg", 0)

laplacian = cv2.Laplacian(img, cv2.CV_64F)

cv2.imshow("Laplacian", laplacian)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Canny Edge Detection
The Canny method is a multi-stage edge detector that includes noise reduction, gradient calculation, non-maximum suppression, and hysteresis thresholding for precise edge detection.

import cv2

img = cv2.imread("photo.jpg", 0)

edges = cv2.Canny(img, 100, 200)

cv2.imshow("Canny Edges", edges)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Edge Maps Comparison
Comparing different edge detection methods helps choose the right one based on image content and noise levels. Sobel is simpler, Canny is more precise but complex.

# Compare Sobel, Laplacian, and Canny by displaying results side-by-side
# (Combine above examples visually in an app)
      

Introduction to Contours
Contours are curves joining continuous points with the same intensity, useful for shape analysis and object detection. They simplify complex shapes into manageable outlines.

Finding Contours
`cv2.findContours()` extracts contours from binary images. It returns contour points and hierarchy for nested contours.

import cv2

img = cv2.imread("shapes.png", 0)
ret, thresh = cv2.threshold(img, 127, 255, 0)

contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
      

Drawing and Labeling Contours
Use `cv2.drawContours()` to visualize contours on images. Contours can be labeled or numbered for identification.

img_color = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)

for i, contour in enumerate(contours):
    cv2.drawContours(img_color, [contour], -1, (0,255,0), 2)
    # Label contour
    x, y = contour[0][0]
    cv2.putText(img_color, f'#{i+1}', (x,y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,0,255), 1)

cv2.imshow("Contours", img_color)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Contour Properties (Area, Perimeter)
Contour area and perimeter can be calculated using `cv2.contourArea()` and `cv2.arcLength()`. These properties help filter or classify contours.

for contour in contours:
    area = cv2.contourArea(contour)
    perimeter = cv2.arcLength(contour, True)
    print(f"Area: {area}, Perimeter: {perimeter}")
      

Contour Approximation
Approximate contours to simpler polygons with `cv2.approxPolyDP()`, reducing points while maintaining shape.

for contour in contours:
    epsilon = 0.02 * cv2.arcLength(contour, True)
    approx = cv2.approxPolyDP(contour, epsilon, True)
    print(f"Approximated points: {len(approx)}")
      

Convex Hull and Defects
Convex hulls wrap contours tightly. Convexity defects reveal inward dents useful for shape analysis.

for contour in contours:
    hull = cv2.convexHull(contour)
    defects = cv2.convexityDefects(contour, cv2.convexHull(contour, returnPoints=False))
    print(f"Convex Hull points: {len(hull)}")
    if defects is not None:
        print(f"Defects count: {defects.shape[0]}")
      

Understanding Histograms
A histogram represents the distribution of pixel intensities in an image. It helps analyze contrast, brightness, and intensity spread, useful for enhancement and segmentation.

Plotting Histogram with OpenCV
Use `cv2.calcHist()` to calculate histograms and libraries like Matplotlib to visualize them.

import cv2
from matplotlib import pyplot as plt

img = cv2.imread("photo.jpg", 0)  # Grayscale

hist = cv2.calcHist([img], [0], None, [256], [0,256])

plt.plot(hist)
plt.title("Grayscale Histogram")
plt.xlabel("Pixel Intensity")
plt.ylabel("Frequency")
plt.show()
      

Histogram Equalization
Histogram equalization enhances image contrast by redistributing intensity values to use the full spectrum.

img = cv2.imread("photo.jpg", 0)

equalized = cv2.equalizeHist(img)

cv2.imshow("Equalized", equalized)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Adaptive Histogram Equalization (CLAHE)
CLAHE improves contrast locally in small regions, preventing noise amplification common in global equalization.

clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
clahe_img = clahe.apply(img)

cv2.imshow("CLAHE", clahe_img)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

2D Color Histograms
Compute histograms over two color channels simultaneously, useful for color-based segmentation and analysis.

img = cv2.imread("photo.jpg")
hist = cv2.calcHist([img], [0, 1], None, [32, 32], [0,256, 0,256])
print(hist.shape)  # 2D histogram shape
      

Histogram Backprojection
Backprojection uses a histogram to find regions in an image matching a particular color distribution, aiding object tracking.

hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
roi_hist = cv2.calcHist([hsv], [0], None, [180], [0,180])
cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)

back_proj = cv2.calcBackProject([hsv], [0], roi_hist, [0,180], 1)

cv2.imshow("Backprojection", back_proj)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Introduction to Gradients
Image gradients measure the change in intensity or color at each pixel, highlighting edges and texture. They are key for edge detection, shape analysis, and feature extraction.

Computing Gradients with Sobel
Sobel operators calculate gradients in x and y directions by convolving the image with specific kernels to reveal directional edges.

import cv2

img = cv2.imread("photo.jpg", 0)

grad_x = cv2.Sobel(img, cv2.CV_64F, 1, 0, ksize=3)
grad_y = cv2.Sobel(img, cv2.CV_64F, 0, 1, ksize=3)

cv2.imshow("Gradient X", grad_x)
cv2.imshow("Gradient Y", grad_y)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Gradient Magnitude and Direction
Combining x and y gradients gives the gradient magnitude (strength) and direction (angle), useful for edge orientation analysis.

import numpy as np

magnitude = cv2.magnitude(grad_x, grad_y)
angle = cv2.phase(grad_x, grad_y, angleInDegrees=True)

cv2.imshow("Gradient Magnitude", magnitude)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Laplacian for Edges
The Laplacian operator calculates the second derivative to find areas with rapid intensity change, emphasizing edges.

laplacian = cv2.Laplacian(img, cv2.CV_64F)

cv2.imshow("Laplacian", laplacian)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Gradient Visualization
Visualizing gradients can be done by normalizing the magnitude or thresholding to highlight strong edges.

grad_display = cv2.convertScaleAbs(magnitude)
cv2.imshow("Gradient Visualization", grad_display)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Gradient Filters in Applications
Gradient information is widely used in edge detection, texture analysis, image sharpening, and object detection.

# Example: Use gradient magnitude to create edge mask
_, edge_mask = cv2.threshold(grad_display, 50, 255, cv2.THRESH_BINARY)
cv2.imshow("Edge Mask", edge_mask)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Affine Transformation Basics
Affine transformations map points using rotation, translation, scaling, and shearing, preserving lines and parallelism but not angles.

import cv2
import numpy as np

img = cv2.imread("photo.jpg")

pts1 = np.float32([[50,50],[200,50],[50,200]])
pts2 = np.float32([[10,100],[200,50],[100,250]])

M = cv2.getAffineTransform(pts1, pts2)
affine = cv2.warpAffine(img, M, (img.shape[1], img.shape[0]))

cv2.imshow("Affine Transform", affine)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Perspective Transformation
Perspective transforms simulate viewpoint changes, mapping quadrilaterals between images using a 3x3 matrix.

pts1 = np.float32([[56,65],[368,52],[28,387],[389,390]])
pts2 = np.float32([[0,0],[300,0],[0,300],[300,300]])

M = cv2.getPerspectiveTransform(pts1, pts2)
perspective = cv2.warpPerspective(img, M, (300,300))

cv2.imshow("Perspective Transform", perspective)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Image Warping
Warping applies geometric transformations to distort or correct images, useful for image stitching or correction.

# Image warping example uses warpAffine or warpPerspective functions
# (see above examples)
      

Rotation with Interpolation
Rotation rotates images around a point with interpolation to avoid quality loss.

(h, w) = img.shape[:2]
center = (w//2, h//2)

M = cv2.getRotationMatrix2D(center, 45, 1.0)
rotated = cv2.warpAffine(img, M, (w, h))

cv2.imshow("Rotated Image", rotated)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Skew Correction
Skew correction straightens tilted images, often by detecting edges and applying perspective transforms.

# Skew correction involves detecting edges and applying perspective transform
# Example code is similar to perspective transform example
      

Real-world Perspective Correction
Correct perspective distortion in photos, especially for documents and signage, to improve readability.

# Real-world correction often combines edge detection and perspective warp
# See perspective transform example above
      

Bitwise AND, OR, XOR
Bitwise operations combine images pixel-wise: AND keeps overlapping parts, OR merges all parts, XOR highlights non-overlapping areas.

import cv2

img1 = cv2.imread("mask1.png", 0)
img2 = cv2.imread("mask2.png", 0)

bitwise_and = cv2.bitwise_and(img1, img2)
bitwise_or = cv2.bitwise_or(img1, img2)
bitwise_xor = cv2.bitwise_xor(img1, img2)

cv2.imshow("AND", bitwise_and)
cv2.imshow("OR", bitwise_or)
cv2.imshow("XOR", bitwise_xor)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Image Masking
Masks isolate image regions for processing or analysis, applying bitwise operations with masks.

img = cv2.imread("photo.jpg")
mask = cv2.imread("mask.png", 0)

masked = cv2.bitwise_and(img, img, mask=mask)

cv2.imshow("Masked Image", masked)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Image Blending with Bitwise Ops
Blend images selectively by combining masked regions using bitwise operations.

# Blend example:
# result = bitwise_and + bitwise_or or use addWeighted for smooth blend
      

Creating Image Overlays
Use masks and bitwise operations to overlay graphics or text on images without affecting the background.

# Overlaying uses masks with bitwise_and and bitwise_or
      

Combining Images Dynamically
Dynamically combine images for effects like transparency or compositing using bitwise functions.

# Dynamic combination handled with bitwise and alpha blending
      

Application in Image Segmentation
Bitwise operations separate foreground from background in segmentation pipelines.

# Example: Extract segmented object using mask
segmented = cv2.bitwise_and(img, img, mask=mask)
      

Introduction to Template Matching
Template matching locates small parts of an image matching a template pattern. Useful for object detection and recognition.

Matching with cv2.matchTemplate
This function slides the template over the image and compares patches using similarity metrics.

import cv2
import numpy as np

img = cv2.imread("scene.jpg", 0)
template = cv2.imread("template.jpg", 0)

res = cv2.matchTemplate(img, template, cv2.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)

top_left = max_loc
h, w = template.shape

bottom_right = (top_left[0] + w, top_left[1] + h)
cv2.rectangle(img, top_left, bottom_right, 255, 2)

cv2.imshow("Detected Template", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Template Matching with Threshold
Thresholding similarity results helps find multiple matching locations above a certain confidence.

threshold = 0.8
loc = np.where(res >= threshold)

for pt in zip(*loc[::-1]):
    cv2.rectangle(img, pt, (pt[0]+w, pt[1]+h), (0,255,0), 2)

cv2.imshow("Multiple Matches", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Multi-Template Matching
Match multiple templates sequentially or in parallel to detect various objects.

# Loop over several templates and apply matchTemplate each time
      

Rotated Template Matching
Handle rotated versions of templates by rotating the template or image and applying matching.

# Rotate template with cv2.getRotationMatrix2D and match again
      

Real-time Template Tracking
Use template matching in video frames to track objects dynamically.

# Capture video, apply matchTemplate on each frame, draw rectangle around match
      

Reading Video from File
OpenCV allows reading video files frame by frame using `cv2.VideoCapture()`. You can process or analyze each frame in a loop until the video ends.

import cv2

cap = cv2.VideoCapture("video.mp4")  # Open video file

while cap.isOpened():
    ret, frame = cap.read()  # Read frame
    if not ret:
        break  # End of video

    cv2.imshow("Video Frame", frame)

    if cv2.waitKey(25) & 0xFF == ord('q'):  # Press 'q' to quit
        break

cap.release()
cv2.destroyAllWindows()
      

Capturing Video from Camera
Capture live video from a webcam or camera device by passing `0` or device index to `VideoCapture`. Useful for real-time applications.

cap = cv2.VideoCapture(0)  # Open default camera

while True:
    ret, frame = cap.read()
    if not ret:
        break

    cv2.imshow("Webcam", frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()
      

Writing Video to File
You can save processed video frames to a file by creating a `cv2.VideoWriter` object with codec, frame rate, and frame size.

cap = cv2.VideoCapture(0)

# Define codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter('output.avi', fourcc, 20.0, (640,480))

while True:
    ret, frame = cap.read()
    if not ret:
        break

    out.write(frame)  # Write frame to file

    cv2.imshow("Recording", frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
out.release()
cv2.destroyAllWindows()
      

Frame-by-Frame Processing
Each frame can be individually processed, such as applying filters, detecting objects, or manipulating pixels before displaying or saving.

cap = cv2.VideoCapture("video.mp4")

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)  # Convert frame to grayscale

    cv2.imshow("Gray Video", gray)

    if cv2.waitKey(25) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()
      

Real-Time Effects
You can apply effects such as edge detection or blurring on each frame live, allowing creative video manipulation.

cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    edges = cv2.Canny(frame, 100, 200)  # Apply Canny edge detector

    cv2.imshow("Edges", edges)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()
      

Keyboard Event Handling
Handle keyboard input during video playback or capture to control flow or trigger actions, using `cv2.waitKey()`.

# Example: Quit on 'q', pause on 'p'
paused = False
cap = cv2.VideoCapture("video.mp4")

while cap.isOpened():
    if not paused:
        ret, frame = cap.read()
        if not ret:
            break
        cv2.imshow("Video", frame)

    key = cv2.waitKey(30) & 0xFF

    if key == ord('q'):
        break
    elif key == ord('p'):
        paused = not paused  # Toggle pause

cap.release()
cv2.destroyAllWindows()
      

Introduction to Motion Detection
Optical flow estimates the motion of objects between consecutive video frames by analyzing pixel intensity changes, essential for tracking and motion analysis.

Dense vs Sparse Optical Flow
Dense optical flow computes motion for every pixel, producing detailed flow maps. Sparse flow tracks selected key points, reducing computation and focusing on important features.

import cv2
import numpy as np

cap = cv2.VideoCapture('video.mp4')
ret, frame1 = cap.read()
prev_gray = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)

# Parameters for ShiTomasi corner detection (for sparse flow)
feature_params = dict(maxCorners=100, qualityLevel=0.3, minDistance=7, blockSize=7)

# Parameters for Lucas-Kanade optical flow
lk_params = dict(winSize=(15,15), maxLevel=2, criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))

p0 = cv2.goodFeaturesToTrack(prev_gray, mask=None, **feature_params)

while True:
    ret, frame2 = cap.read()
    if not ret:
        break
    frame_gray = cv2.cvtColor(frame2, cv2.COLOR_BGR2GRAY)

    # Calculate optical flow (Sparse)
    p1, st, err = cv2.calcOpticalFlowPyrLK(prev_gray, frame_gray, p0, None, **lk_params)

    # Select good points
    good_new = p1[st==1]
    good_old = p0[st==1]

    # Visualization skipped for brevity

    prev_gray = frame_gray.copy()
    p0 = good_new.reshape(-1,1,2)

cap.release()
cv2.destroyAllWindows()
      

Lucas-Kanade Optical Flow
This method tracks sparse points between frames using pyramidal implementation for better accuracy on small motions.

Farneback Optical Flow
Farneback computes dense optical flow using polynomial expansion, producing motion vectors for all pixels.

cap = cv2.VideoCapture('video.mp4')
ret, frame1 = cap.read()
prev_gray = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)

while True:
    ret, frame2 = cap.read()
    if not ret:
        break
    gray = cv2.cvtColor(frame2, cv2.COLOR_BGR2GRAY)

    flow = cv2.calcOpticalFlowFarneback(prev_gray, gray, None, 
                                        0.5, 3, 15, 3, 5, 1.2, 0)

    # Visualization skipped for brevity

    prev_gray = gray

cap.release()
cv2.destroyAllWindows()
      

Drawing Motion Vectors
Motion vectors show direction and speed of pixel movement, visualized as arrows or color-coded maps.

# Example: Draw lines for sparse flow points
for i,(new,old) in enumerate(zip(good_new, good_old)):
    a,b = new.ravel()
    c,d = old.ravel()
    cv2.line(frame2, (a,b), (c,d), (0,255,0), 2)
    cv2.circle(frame2, (a,b), 5, (0,0,255), -1)
      

Applications in Video Analysis
Optical flow is used in video stabilization, object tracking, activity recognition, and autonomous driving.

Introduction to Object Tracking
Object tracking follows an object’s location through consecutive video frames. It’s essential for surveillance, robotics, and human-computer interaction.

Using cv2.Tracker API
OpenCV provides several built-in trackers via `cv2.Tracker` API, which simplifies tracking initialization and update.

import cv2

cap = cv2.VideoCapture(0)  # Open camera

# Create tracker object (example: KCF)
tracker = cv2.TrackerKCF_create()

ret, frame = cap.read()
bbox = cv2.selectROI("Frame", frame, False)  # Select object to track
tracker.init(frame, bbox)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    success, bbox = tracker.update(frame)
    if success:
        # Draw bounding box
        x, y, w, h = map(int, bbox)
        cv2.rectangle(frame, (x,y), (x+w, y+h), (0,255,0), 2)
    else:
        cv2.putText(frame, "Tracking failure", (50,80), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0,0,255), 2)

    cv2.imshow("Tracking", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()
      

MIL and KCF Tracking
MIL (Multiple Instance Learning) and KCF (Kernelized Correlation Filters) trackers offer a good balance between speed and accuracy for many scenarios.

CSRT Tracker
CSRT tracker provides higher accuracy especially with object scale changes, though slower than KCF.

Multi-Object Tracking
Track multiple objects by creating multiple tracker instances and updating them independently in each frame.

trackers = cv2.MultiTracker_create()

# Add multiple objects
bbox1 = cv2.selectROI("Frame", frame, False)
tracker1 = cv2.TrackerKCF_create()
trackers.add(tracker1, frame, bbox1)

# Add more trackers as needed...

while True:
    ret, frame = cap.read()
    if not ret:
        break

    success, boxes = trackers.update(frame)
    for box in boxes:
        x, y, w, h = map(int, box)
        cv2.rectangle(frame, (x,y), (x+w, y+h), (255,0,0), 2)

    cv2.imshow("Multi Tracking", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
      

Performance Comparison
KCF is fast and accurate, MIL is robust to occlusion, and CSRT is best for accuracy but slower. Choose based on your needs.

Haar Cascade Classifiers
Haar Cascades are machine learning classifiers trained to detect objects by analyzing Haar features. They work well for face detection with good speed.

Loading Pretrained Face Classifier
OpenCV includes pretrained XML classifiers for frontal faces, which can be loaded easily.

import cv2

face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
      

Detecting Multiple Faces
Detect multiple faces by running the classifier on the grayscale image, which returns bounding boxes for each face.

img = cv2.imread("group.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)

for (x, y, w, h) in faces:
    cv2.rectangle(img, (x,y), (x+w, y+h), (255,0,0), 2)

cv2.imshow("Faces", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Face Detection in Video
Detect faces in live video by applying the same detection logic frame by frame.

cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray, 1.1, 5)

    for (x,y,w,h) in faces:
        cv2.rectangle(frame, (x,y), (x+w,y+h), (0,255,0), 2)

    cv2.imshow("Face Detection", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()
      

Tuning Parameters
Adjust `scaleFactor` (image scale step), `minNeighbors` (filtering false positives), and `minSize` to improve detection accuracy and speed.

Limitations of Haar Cascades
Haar Cascades can struggle with faces at extreme angles, varying lighting, or occlusion. Modern methods use deep learning for better accuracy.

Introduction to DNN Module
OpenCV’s DNN module allows loading and running pre-trained deep learning models for tasks like object detection, classification, and segmentation efficiently.

Loading Pretrained Caffe Models
You can load models from frameworks like Caffe by providing the model architecture (.prototxt) and weights (.caffemodel) files.

import cv2

net = cv2.dnn.readNetFromCaffe("deploy.prototxt", "model.caffemodel")
      

Object Detection using MobileNet SSD
MobileNet SSD is a lightweight model for real-time object detection. Input images are preprocessed and fed to the network for predictions.

image = cv2.imread("image.jpg")
(h, w) = image.shape[:2]

blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 0.007843, (300,300), 127.5)
net.setInput(blob)
detections = net.forward()

for i in range(detections.shape[2]):
    confidence = detections[0, 0, i, 2]
    if confidence > 0.5:
        box = detections[0, 0, i, 3:7] * [w, h, w, h]
        (startX, startY, endX, endY) = box.astype("int")
        cv2.rectangle(image, (startX, startY), (endX, endY), (0,255,0), 2)

cv2.imshow("Detections", image)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Running YOLO in OpenCV
YOLO models can be loaded using OpenCV DNN to perform fast, accurate real-time detection on images and video.

net = cv2.dnn.readNetFromDarknet("yolov3.cfg", "yolov3.weights")
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

blob = cv2.dnn.blobFromImage(image, 1/255.0, (416,416), swapRB=True, crop=False)
net.setInput(blob)
outputs = net.forward(output_layers)

# Process outputs to extract bounding boxes (code omitted for brevity)
      

Pose Estimation
OpenCV DNN supports pose estimation models that detect body keypoints for applications like fitness and animation.

# Load pose estimation model and forward pass similar to detection
# Extract and visualize keypoints on the image
      

Real-time DNN on Webcam
You can capture webcam frames and run DNN detection in real-time, showing bounding boxes live.

cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300,300)), 0.007843, (300,300), 127.5)
    net.setInput(blob)
    detections = net.forward()

    for i in range(detections.shape[2]):
        confidence = detections[0, 0, i, 2]
        if confidence > 0.5:
            box = detections[0, 0, i, 3:7] * [frame.shape[1], frame.shape[0], frame.shape[1], frame.shape[0]]
            (startX, startY, endX, endY) = box.astype("int")
            cv2.rectangle(frame, (startX, startY), (endX, endY), (0,255,0), 2)

    cv2.imshow("Real-time DNN", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()
      

Introduction to Image Segmentation
Image segmentation divides an image into meaningful parts or regions, such as separating foreground objects from the background. It is useful in medical imaging, object detection, and more.

Thresholding Techniques
Thresholding converts grayscale images into binary images by choosing a threshold value to separate pixels.

import cv2

img = cv2.imread('coins.jpg', 0)  # Read image in grayscale

# Simple binary threshold
_, thresh = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)

cv2.imshow('Threshold', thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Adaptive Thresholding
Adaptive thresholding calculates threshold for smaller regions, handling varying lighting conditions better.

thresh_adapt = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, 
                                     cv2.THRESH_BINARY, 11, 2)

cv2.imshow('Adaptive Threshold', thresh_adapt)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Watershed Algorithm
Watershed is a segmentation technique treating grayscale images as topographic surfaces and separating touching objects.

import numpy as np

# Read image and convert to grayscale
img = cv2.imread('coins.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Threshold and noise removal
_, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
kernel = np.ones((3,3), np.uint8)
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=2)

# Sure background area
sure_bg = cv2.dilate(opening, kernel, iterations=3)

# Finding sure foreground area
dist_transform = cv2.distanceTransform(opening, cv2.DIST_L2, 5)
_, sure_fg = cv2.threshold(dist_transform, 0.7 * dist_transform.max(), 255, 0)

# Finding unknown region
sure_fg = np.uint8(sure_fg)
unknown = cv2.subtract(sure_bg, sure_fg)

# Marker labelling
_, markers = cv2.connectedComponents(sure_fg)
markers = markers + 1
markers[unknown == 255] = 0

markers = cv2.watershed(img, markers)
img[markers == -1] = [0, 0, 255]  # Mark boundaries in red

cv2.imshow('Watershed Segmentation', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

GrabCut Algorithm
GrabCut is an iterative algorithm for foreground extraction using a bounding box as initialization.

img = cv2.imread('person.jpg')
mask = np.zeros(img.shape[:2], np.uint8)

bgdModel = np.zeros((1,65), np.float64)
fgdModel = np.zeros((1,65), np.float64)

rect = (50, 50, 450, 290)  # ROI for foreground

cv2.grabCut(img, mask, rect, bgdModel, fgdModel, 5, cv2.GC_INIT_WITH_RECT)
mask2 = np.where((mask==2)|(mask==0), 0, 1).astype('uint8')
img_cut = img * mask2[:, :, np.newaxis]

cv2.imshow('GrabCut Segmentation', img_cut)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Introduction to Feature Detection
Feature detection identifies distinctive points or regions in images, such as corners, blobs, or edges. These features help in object recognition, tracking, and image matching.

Using Harris Corner Detector
Harris detector finds corners by looking at changes in intensity in multiple directions.

import cv2
import numpy as np

img = cv2.imread('chessboard.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

gray = np.float32(gray)
dst = cv2.cornerHarris(gray, 2, 3, 0.04)

dst = cv2.dilate(dst, None)
img[dst > 0.01 * dst.max()] = [0, 0, 255]

cv2.imshow('Harris Corners', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

SIFT (Scale-Invariant Feature Transform)
SIFT detects keypoints and computes descriptors invariant to scale and rotation, useful for robust matching.

sift = cv2.SIFT_create()
keypoints, descriptors = sift.detectAndCompute(gray, None)

img_sift = cv2.drawKeypoints(img, keypoints, None)
cv2.imshow('SIFT Features', img_sift)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

ORB (Oriented FAST and Rotated BRIEF)
ORB is a fast alternative to SIFT/SURF, free to use, combining FAST keypoint detector and BRIEF descriptor.

orb = cv2.ORB_create()
keypoints, descriptors = orb.detectAndCompute(gray, None)

img_orb = cv2.drawKeypoints(img, keypoints, None)
cv2.imshow('ORB Features', img_orb)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Feature Matching with BFMatcher
Matches descriptors between two images to find corresponding points.

img2 = cv2.imread('scene.jpg')
gray2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)

kp2, des2 = sift.detectAndCompute(gray2, None)
bf = cv2.BFMatcher()

matches = bf.knnMatch(descriptors, des2, k=2)

# Apply ratio test
good_matches = []
for m,n in matches:
    if m.distance < 0.75 * n.distance:
        good_matches.append(m)

img_matches = cv2.drawMatches(img, keypoints, img2, kp2, good_matches, None, flags=2)
cv2.imshow('Matches', img_matches)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Introduction to Camera Calibration
Calibration finds intrinsic and extrinsic camera parameters to correct lens distortion and relate 3D world points to 2D image points.

Using a Chessboard Pattern
Images of a known pattern (like a chessboard) are used to detect corners and estimate camera parameters.

import cv2
import numpy as np
import glob

# Prepare object points for a 9x6 chessboard pattern
objp = np.zeros((6*9,3), np.float32)
objp[:,:2] = np.mgrid[0:9,0:6].T.reshape(-1,2)

objpoints = []  # 3D points in real world space
imgpoints = []  # 2D points in image plane

images = glob.glob('calib_images/*.jpg')

for fname in images:
    img = cv2.imread(fname)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    ret, corners = cv2.findChessboardCorners(gray, (9,6), None)
    if ret:
        objpoints.append(objp)
        imgpoints.append(corners)

        cv2.drawChessboardCorners(img, (9,6), corners, ret)
        cv2.imshow('Chessboard', img)
        cv2.waitKey(100)

cv2.destroyAllWindows()
      

Calibrating the Camera
Use collected points to calculate camera matrix and distortion coefficients.

ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)

print("Camera matrix:\n", mtx)
print("Distortion coefficients:\n", dist)
      

Undistorting Images
Correct distorted images using calibration results.

img = cv2.imread('test.jpg')
h, w = img.shape[:2]
newcameramtx, roi = cv2.getOptimalNewCameraMatrix(mtx, dist, (w,h), 1, (w,h))

dst = cv2.undistort(img, mtx, dist, None, newcameramtx)

x, y, w, h = roi
dst = dst[y:y+h, x:x+w]

cv2.imshow('Undistorted Image', dst)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

3D Reconstruction Basics
By using stereo images and camera parameters, depth and 3D coordinates of points can be computed.

# Example of stereo calibration and reconstruction is more complex and typically requires stereo image pairs.
# This is a high-level overview; full implementation involves stereoRectify, compute disparity, and reproject to 3D.
      

Introduction to Image Stitching
Image stitching combines overlapping images to create a wide-view panorama, used in photography and mapping.

Feature Detection and Matching
Detect keypoints and match features between images to find correspondences for alignment.

import cv2

img1 = cv2.imread('left.jpg')
img2 = cv2.imread('right.jpg')

# Initialize ORB detector
orb = cv2.ORB_create()

# Find keypoints and descriptors
kp1, des1 = orb.detectAndCompute(img1, None)
kp2, des2 = orb.detectAndCompute(img2, None)

# Match descriptors using BFMatcher
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1, des2)

matches = sorted(matches, key=lambda x: x.distance)

img_matches = cv2.drawMatches(img1, kp1, img2, kp2, matches[:20], None, flags=2)
cv2.imshow('Matches', img_matches)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Finding Homography
Estimate the transformation matrix (homography) that warps one image to align with the other.

import numpy as np

src_pts = np.float32([kp1[m.queryIdx].pt for m in matches]).reshape(-1,1,2)
dst_pts = np.float32([kp2[m.trainIdx].pt for m in matches]).reshape(-1,1,2)

H, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)
      

Warping and Stitching Images
Use the homography matrix to warp images and blend them into a panorama.

height, width, _ = img2.shape
result = cv2.warpPerspective(img1, H, (width * 2, height))

result[0:height, 0:width] = img2

cv2.imshow('Panorama', result)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Using OpenCV Stitcher Class
OpenCV provides a Stitcher class that simplifies panorama creation with automatic detection, alignment, and blending.

stitcher = cv2.Stitcher_create()
status, pano = stitcher.stitch([img1, img2])

if status == cv2.STITCHER_OK:
    cv2.imshow('Panorama', pano)
    cv2.waitKey(0)
else:
    print('Error during stitching')

cv2.destroyAllWindows()
      

Introduction to Camera Models
Camera models describe how 3D scenes are projected onto 2D images. The pinhole camera model is the most common, simulating projection through a single point.

Pinhole Camera Model
This model maps 3D world points onto a 2D image plane using intrinsic and extrinsic parameters.

# Formula: s * [u v 1]^T = K * [R | t] * [X Y Z 1]^T
# where:
# s = scale factor
# [u v 1] = pixel coordinates (homogeneous)
# K = intrinsic matrix
# R, t = rotation and translation (extrinsic)
# [X Y Z 1] = 3D world coordinates (homogeneous)
      

Intrinsic Parameters
These include focal length, optical center, and skew, forming matrix K which maps camera coordinates to image pixels.

import numpy as np

K = np.array([[fx, 0, cx],
              [0, fy, cy],
              [0,  0,  1]])
      

Extrinsic Parameters
Represent camera orientation and position in the world via rotation matrix R and translation vector t.

R = np.eye(3)  # Example rotation matrix
t = np.array([[tx], [ty], [tz]])  # Translation vector
      

Projection Equation
Combine intrinsic and extrinsic parameters to project 3D points onto the image plane.

# Example function to project 3D point
def project_point(X, K, R, t):
    X_homog = np.append(X, 1)
    RT = np.hstack((R, t))
    x_cam = RT @ X_homog
    x_img = K @ x_cam
    x_img /= x_img[2]
    return x_img[:2]
      

Distortion Models
Real lenses cause distortions such as radial and tangential; these are modeled and corrected during calibration.

Purpose of Camera Calibration
Calibration estimates camera parameters to remove lens distortion and map 3D points accurately onto images.

Checkerboard Calibration Method
Uses images of a checkerboard pattern to find corner points and compute camera matrix and distortion coefficients.

import cv2
import numpy as np
import glob

# Prepare object points (0,0,0), (1,0,0), ..., (8,5,0)
objp = np.zeros((6*9,3), np.float32)
objp[:,:2] = np.mgrid[0:9,0:6].T.reshape(-1,2)

objpoints = []
imgpoints = []

images = glob.glob('calib_images/*.jpg')

for fname in images:
    img = cv2.imread(fname)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    ret, corners = cv2.findChessboardCorners(gray, (9,6), None)
    if ret:
        objpoints.append(objp)
        imgpoints.append(corners)

        cv2.drawChessboardCorners(img, (9,6), corners, ret)
        cv2.imshow('Corners', img)
        cv2.waitKey(100)

cv2.destroyAllWindows()
      

Estimating Camera Parameters
Compute intrinsic matrix, distortion coefficients, rotation and translation vectors using collected points.

ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)

print("Camera Matrix:\n", mtx)
print("Distortion Coefficients:\n", dist)
      

Refining Calibration
Use more images and different views of the checkerboard for better accuracy.

Undistorting Images
Correct distorted images using the obtained parameters.

img = cv2.imread('test.jpg')
h, w = img.shape[:2]

newcameramtx, roi = cv2.getOptimalNewCameraMatrix(mtx, dist, (w,h), 1, (w,h))
dst = cv2.undistort(img, mtx, dist, None, newcameramtx)

x, y, w, h = roi
dst = dst[y:y+h, x:x+w]

cv2.imshow('Undistorted', dst)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Introduction to Stereo Vision
Stereo vision uses two cameras placed at a distance to perceive depth by comparing the difference (disparity) between the two images.

Capturing Stereo Images
Obtain synchronized images from left and right cameras of the same scene for disparity calculation.

Rectification of Stereo Images
Align images so that corresponding points lie on the same horizontal line, simplifying disparity search.

# Assuming calibration done and matrices obtained
# Use cv2.stereoRectify(), cv2.initUndistortRectifyMap() to rectify images
      

Computing Disparity Map
Use block matching algorithms like StereoBM or StereoSGBM to compute disparity (difference in pixel locations).

import cv2
import numpy as np

left_img = cv2.imread('left.jpg', 0)
right_img = cv2.imread('right.jpg', 0)

stereo = cv2.StereoBM_create(numDisparities=16*5, blockSize=15)
disparity = stereo.compute(left_img, right_img)

cv2.imshow('Disparity Map', disparity)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Depth Map Calculation
Convert disparity to depth using camera parameters and baseline distance.

# depth = (focal_length * baseline) / disparity
      

Applications
Stereo vision is used in autonomous vehicles, robotics, and 3D reconstruction.

Introduction to OCR
OCR converts images of text into machine-encoded text. OpenCV can preprocess images to improve OCR accuracy using tools like Tesseract.

Preprocessing for OCR
Convert images to grayscale, threshold to binary, and remove noise for better text recognition.

import cv2

img = cv2.imread('text_image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY_INV)

cv2.imshow('Preprocessed Image', thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Using Tesseract with OpenCV
Integrate Tesseract OCR engine with OpenCV to extract text.

import pytesseract

# Ensure pytesseract is installed and Tesseract OCR engine is set up

text = pytesseract.image_to_string(thresh)
print("Recognized Text:")
print(text)
      

Improving OCR Accuracy
Apply techniques like dilation, erosion, and contour detection to isolate text regions.

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
dilated = cv2.dilate(thresh, kernel, iterations=1)

cv2.imshow('Dilated Image', dilated)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Applications of OCR
OCR is widely used for digitizing printed documents, license plate recognition, and automated data entry.

Introduction to Real-time Object Detection
Real-time detection identifies objects instantly in video streams, essential for surveillance, autonomous vehicles, and robotics.

Using Pretrained YOLO Models
YOLO (You Only Look Once) models provide fast and accurate object detection.

import cv2

# Load YOLO model
net = cv2.dnn.readNetFromDarknet("yolov3.cfg", "yolov3.weights")
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
      

Processing Webcam Video
Capture video frames and process each frame for detection.

cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    blob = cv2.dnn.blobFromImage(frame, 1/255.0, (416,416), swapRB=True, crop=False)
    net.setInput(blob)
    outputs = net.forward(output_layers)

    # Process outputs to extract boxes, confidences, and class IDs
    # Draw bounding boxes on frame (code omitted for brevity)

    cv2.imshow("Real-time Object Detection", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()
      

Post-processing and Non-Max Suppression
Filter overlapping boxes to improve detection results.

# Use cv2.dnn.NMSBoxes to perform Non-Maximum Suppression
      

Applications
Used in traffic monitoring, security systems, and robotics for live environment understanding.

Introduction to Advanced Techniques
Advanced OpenCV techniques include real-time video analytics, GPU acceleration, and integration with deep learning frameworks.

Using GPU Acceleration
OpenCV’s CUDA modules enable fast processing by leveraging GPU hardware.

import cv2
# Check if CUDA is available
print(cv2.cuda.getCudaEnabledDeviceCount())

# Upload image to GPU memory
img = cv2.imread('image.jpg')
gpu_img = cv2.cuda_GpuMat()
gpu_img.upload(img)

# Perform Gaussian blur on GPU
gpu_blur = cv2.cuda.createGaussianFilter(gpu_img.type(), -1, (15, 15), 0)
blurred = gpu_blur.apply(gpu_img)

# Download result back to CPU memory
result = blurred.download()
cv2.imshow('GPU Blur', result)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Real-time Video Analytics
Combine background subtraction, object detection, and tracking for analytics in video streams.

# Example: Background subtraction with MOG2
cap = cv2.VideoCapture('video.mp4')
fgbg = cv2.createBackgroundSubtractorMOG2()

while True:
    ret, frame = cap.read()
    if not ret:
        break

    fgmask = fgbg.apply(frame)
    cv2.imshow('Foreground Mask', fgmask)

    if cv2.waitKey(30) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()
      

Integration with Deep Learning Frameworks
OpenCV supports importing models from TensorFlow, PyTorch, and others for complex tasks.

# Load TensorFlow model
net = cv2.dnn.readNetFromTensorflow('frozen_inference_graph.pb', 'graph.pbtxt')

# Use net as usual for forward pass
      

Custom OpenCV Functions and Optimization
Write custom filters and optimize pipelines for speed and accuracy.

Introduction to Machine Learning in OpenCV
OpenCV includes machine learning algorithms like SVM, k-NN, Decision Trees useful for classification and regression.

Training a k-Nearest Neighbors Classifier
k-NN is a simple algorithm that classifies data points based on nearest neighbors.

import cv2
import numpy as np

# Prepare training data: features and labels
trainData = np.random.randint(0, 100, (25, 2)).astype(np.float32)
responses = np.random.randint(0, 2, (25, 1)).astype(np.float32)

# Create and train kNN
knn = cv2.ml.KNearest_create()
knn.train(trainData, cv2.ml.ROW_SAMPLE, responses)

# Predict for new sample
newcomer = np.array([[50, 50]], dtype=np.float32)
ret, results, neighbours, dist = knn.findNearest(newcomer, k=3)

print("Predicted class:", results[0][0])
      

Support Vector Machines (SVM)
SVM finds the best separating hyperplane between classes.

svm = cv2.ml.SVM_create()
svm.setType(cv2.ml.SVM_C_SVC)
svm.setKernel(cv2.ml.SVM_LINEAR)

svm.train(trainData, cv2.ml.ROW_SAMPLE, responses)
pred = svm.predict(newcomer)
print("SVM Prediction:", pred[1][0][0])
      

Decision Trees
Decision Trees split data based on feature values to classify.

dtree = cv2.ml.DTrees_create()
dtree.train(trainData, cv2.ml.ROW_SAMPLE, responses)
pred_dt = dtree.predict(newcomer)
print("Decision Tree Prediction:", pred_dt[1][0][0])
      

Applications
OpenCV ML is used in handwriting recognition, face recognition, and image classification.

Introduction to Deep Learning for Image Classification
Deep learning uses neural networks to classify images into categories automatically, outperforming traditional methods.

Using Pretrained Models in OpenCV
OpenCV supports loading pretrained models like MobileNet, ResNet for fast classification.

import cv2
import numpy as np

# Load pretrained MobileNet model files
net = cv2.dnn.readNetFromCaffe('mobilenet_deploy.prototxt', 'mobilenet.caffemodel')

img = cv2.imread('dog.jpg')
blob = cv2.dnn.blobFromImage(img, 1/127.5, (224, 224), (127.5, 127.5, 127.5), swapRB=True)

net.setInput(blob)
preds = net.forward()

class_id = np.argmax(preds[0])
confidence = preds[0][class_id]
print(f"Class ID: {class_id}, Confidence: {confidence}")
      

Preprocessing Input Images
Resize, normalize, and convert images to blobs to feed into deep models.

Interpreting Output
Output usually is a vector of probabilities for each class; select the highest probability as prediction.

Applications
Used in photo tagging, medical diagnosis, and real-time video classification.

Object Recognition
Detecting known objects in images by matching features or using learned models.

Localization
Determining the precise location (bounding box) of the recognized object.

Using Template Matching
Match a smaller template image inside a larger image.

import cv2
import numpy as np

img = cv2.imread('scene.jpg')
template = cv2.imread('template.jpg', 0)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

res = cv2.matchTemplate(img_gray, template, cv2.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)

top_left = max_loc
h, w = template.shape
bottom_right = (top_left[0] + w, top_left[1] + h)

cv2.rectangle(img, top_left, bottom_right, (0,255,0), 2)
cv2.imshow('Detected', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Applications
Used in quality control, inventory management, and augmented reality.

Gesture Recognition Overview
Detect hand gestures from images or video for interactive applications.

Detecting Hand Contours
Segment hand using color or background subtraction and find contours.

import cv2

cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    lower_skin = (0, 20, 70)
    upper_skin = (20, 255, 255)
    mask = cv2.inRange(hsv, lower_skin, upper_skin)

    contours, _ = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    if contours:
        cnt = max(contours, key=cv2.contourArea)
        cv2.drawContours(frame, [cnt], -1, (0,255,0), 3)

    cv2.imshow('Hand Detection', frame)
    if cv2.waitKey(1) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()
      

Applications
Controls for games, sign language interpretation, and touchless interfaces.

Face Recognition vs Detection
Detection locates faces; recognition identifies the person.

Using LBPH Face Recognizer
Local Binary Patterns Histograms for face recognition.

import cv2

recognizer = cv2.face.LBPHFaceRecognizer_create()
recognizer.read('trainer.yml')

face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

img = cv2.imread('test.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray)

for (x,y,w,h) in faces:
    roi_gray = gray[y:y+h, x:x+w]
    id_, conf = recognizer.predict(roi_gray)
    cv2.rectangle(img, (x,y), (x+w,y+h), (255,0,0), 2)
    cv2.putText(img, str(id_), (x,y-10), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 2)

cv2.imshow('Face Recognition', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Applications
Security systems, attendance, and personalized user experiences.

AR Basics
AR overlays virtual objects on the real world using camera feed.

Marker-based AR
Detect predefined markers to position virtual objects.

# Using OpenCV ArUco markers
import cv2
import cv2.aruco as aruco

cap = cv2.VideoCapture(0)
dictionary = aruco.Dictionary_get(aruco.DICT_6X6_250)
parameters = aruco.DetectorParameters_create()

while True:
    ret, frame = cap.read()
    if not ret:
        break

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    corners, ids, _ = aruco.detectMarkers(gray, dictionary, parameters=parameters)

    if ids is not None:
        aruco.drawDetectedMarkers(frame, corners, ids)

    cv2.imshow('AR Marker Detection', frame)
    if cv2.waitKey(1) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()
      

Applications
Games, education, industrial maintenance, and navigation.

Segmentation Overview
Dividing an image into meaningful regions or objects.

Thresholding
Simple method based on pixel intensity.

import cv2

img = cv2.imread('coins.jpg', 0)
_, thresh = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)

cv2.imshow('Thresholded', thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Watershed Algorithm
Advanced segmentation based on markers and flooding concept.

import numpy as np

# Assume binary image thresh is obtained
dist_transform = cv2.distanceTransform(thresh, cv2.DIST_L2, 5)
_, sure_fg = cv2.threshold(dist_transform, 0.7*dist_transform.max(), 255, 0)
sure_fg = np.uint8(sure_fg)
unknown = cv2.subtract(thresh, sure_fg)

# Marker labeling
_, markers = cv2.connectedComponents(sure_fg)
markers = markers + 1
markers[unknown==255] = 0

markers = cv2.watershed(cv2.cvtColor(img, cv2.COLOR_GRAY2BGR), markers)
img[markers == -1] = [0,0,255]

cv2.imshow('Watershed', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Applications
Medical imaging, object detection, and scene understanding.

Image Enhancement Basics
Improve visual quality by adjusting contrast, brightness, and removing noise.

Histogram Equalization
Equalizes image intensity to enhance contrast.

import cv2

img = cv2.imread('low_contrast.jpg', 0)
equ = cv2.equalizeHist(img)

cv2.imshow('Original', img)
cv2.imshow('Equalized', equ)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Denoising
Remove noise using filters like Gaussian or median.

denoised = cv2.medianBlur(img, 5)
cv2.imshow('Denoised', denoised)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Applications
Prepare images for analysis, medical imaging, and photography.

Camera Calibration Recap
Use chessboard patterns to estimate camera parameters for distortion correction.

Forensic Applications
Analyze image authenticity, source, and manipulation using calibration data.

Using EXIF and Metadata
Extract metadata to understand image capture details.

from PIL import Image
from PIL.ExifTags import TAGS

img = Image.open('image.jpg')
exif_data = img._getexif()

for tag_id, value in exif_data.items():
    tag = TAGS.get(tag_id, tag_id)
    print(f"{tag}: {value}")
      

Applications
Law enforcement, digital forensics, and media verification.

Overview of Video Analytics
Extract meaningful info like motion patterns, crowd counting, and anomaly detection from video streams.

Background Subtraction Techniques
Use MOG2 or KNN algorithms to isolate moving objects.

import cv2

cap = cv2.VideoCapture('video.mp4')
fgbg = cv2.createBackgroundSubtractorMOG2()

while True:
    ret, frame = cap.read()
    if not ret:
        break

    fgmask = fgbg.apply(frame)
    cv2.imshow('Foreground Mask', fgmask)

    if cv2.waitKey(30) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()
      

Object Tracking
Track objects frame to frame using algorithms like CSRT, KCF.

Applications
Security surveillance, traffic monitoring, and sports analysis.

Integrating ML Models in OpenCV
OpenCV supports loading and running machine learning models like SVMs, Decision Trees, and deep learning frameworks, enabling image classification and detection tasks.

Example: Loading an SVM Model
import cv2
import numpy as np

# Load pre-trained SVM model from file
svm = cv2.ml.SVM_load('svm_model.xml')

# Prepare sample input data (2D feature vector)
sample = np.array([[12.5, 3.7]], dtype=np.float32)

# Predict class label
_, result = svm.predict(sample)
print("Predicted class:", result[0][0])
      

Image Stitching Overview
Combine multiple images with overlapping fields of view into a single panoramic image using feature detection and homography.

Basic Stitching Example
import cv2

images = [cv2.imread('img1.jpg'), cv2.imread('img2.jpg')]

stitcher = cv2.Stitcher_create()
status, pano = stitcher.stitch(images)

if status == cv2.STITCHER_OK:
    cv2.imshow('Panorama', pano)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
else:
    print("Stitching failed:", status)
      

3D Reconstruction Introduction
Create 3D models from multiple 2D images by extracting depth information and building point clouds.

Basic Depth Map Example
import cv2
import numpy as np

# Load stereo images
imgL = cv2.imread('left.jpg', 0)
imgR = cv2.imread('right.jpg', 0)

stereo = cv2.StereoBM_create(numDisparities=16, blockSize=15)
disparity = stereo.compute(imgL, imgR)

cv2.imshow('Disparity', disparity)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Image Compression Basics
Compress images to save storage using lossy (JPEG) and lossless (PNG) formats, balancing quality and size.

Save Image with Compression
import cv2

img = cv2.imread('input.png')

# Save as JPEG with quality = 90 (out of 100)
cv2.imwrite('output.jpg', img, [int(cv2.IMWRITE_JPEG_QUALITY), 90])
      

Refining Calibration
Improve calibration accuracy by increasing sample images and optimizing parameters like reprojection error.

Check Reprojection Error
def reprojection_error(objpoints, imgpoints, rvecs, tvecs, mtx, dist):
    total_error = 0
    for i in range(len(objpoints)):
        imgpoints2, _ = cv2.projectPoints(objpoints[i], rvecs[i], tvecs[i], mtx, dist)
        error = cv2.norm(imgpoints[i], imgpoints2, cv2.NORM_L2) / len(imgpoints2)
        total_error += error
    return total_error / len(objpoints)

# After calibration:
error = reprojection_error(objpoints, imgpoints, rvecs, tvecs, mtx, dist)
print("Mean Reprojection Error:", error)
      

Feature Matching Overview
Match keypoints between images using descriptors like SIFT, SURF, or ORB for object recognition and stitching.

Basic ORB Matching Example
import cv2

img1 = cv2.imread('img1.jpg', 0)
img2 = cv2.imread('img2.jpg', 0)

orb = cv2.ORB_create()

kp1, des1 = orb.detectAndCompute(img1, None)
kp2, des2 = orb.detectAndCompute(img2, None)

bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1, des2)

matches = sorted(matches, key=lambda x: x.distance)

img_matches = cv2.drawMatches(img1, kp1, img2, kp2, matches[:10], None, flags=2)
cv2.imshow('Matches', img_matches)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Real-Time Tracking Overview
Track objects frame-by-frame using algorithms like KCF, CSRT for applications like surveillance and robotics.

Using CSRT Tracker Example
import cv2

cap = cv2.VideoCapture(0)
ret, frame = cap.read()

bbox = cv2.selectROI("Frame", frame, False)
tracker = cv2.TrackerCSRT_create()
tracker.init(frame, bbox)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    success, bbox = tracker.update(frame)
    if success:
        x,y,w,h = [int(v) for v in bbox]
        cv2.rectangle(frame, (x,y), (x+w,y+h), (0,255,0), 2)
    else:
        cv2.putText(frame, "Tracking failure", (50,80), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0,0,255),2)

    cv2.imshow('Tracking', frame)
    if cv2.waitKey(1) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()
      

Color Spaces Overview
Images can be represented in different color spaces (RGB, HSV, LAB) for various processing tasks.

Convert BGR to HSV Example
import cv2

img = cv2.imread('image.jpg')
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

cv2.imshow('HSV Image', hsv)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Video Stabilization Basics
Remove unwanted camera shake to produce smoother videos using frame transformations.

Simple Stabilization Example
import cv2
import numpy as np

cap = cv2.VideoCapture('input_video.mp4')
_, prev = cap.read()
prev_gray = cv2.cvtColor(prev, cv2.COLOR_BGR2GRAY)

transforms = []

while True:
    ret, curr = cap.read()
    if not ret:
        break
    curr_gray = cv2.cvtColor(curr, cv2.COLOR_BGR2GRAY)
    flow = cv2.calcOpticalFlowPyrLK(prev_gray, curr_gray, None, None)[0]
    # ... further processing for stabilization ...
    prev_gray = curr_gray

cap.release()
      

Combining OpenCV with Other Libraries
Integrate OpenCV with NumPy, TensorFlow, and PyTorch to enhance image analysis and deep learning workflows.

Example: Using OpenCV with NumPy
import cv2
import numpy as np

img = cv2.imread('image.jpg')
# Invert colors using NumPy
inverted = 255 - img

cv2.imshow('Inverted Image', inverted)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Real-Time Video Analytics Overview
Apply AI techniques such as object detection and tracking in real-time video streams for surveillance and automation.

Example: Real-Time Object Detection with YOLO
import cv2

net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg')
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    blob = cv2.dnn.blobFromImage(frame, 1/255.0, (416, 416), swapRB=True, crop=False)
    net.setInput(blob)
    layer_outputs = net.forward(net.getUnconnectedOutLayersNames())

    # Post-processing omitted for brevity...

    cv2.imshow('Real-Time Detection', frame)
    if cv2.waitKey(1) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()
      

OpenCV & TensorFlow Integration
Use TensorFlow models for image classification and object detection, with OpenCV handling image pre/post-processing.

Example: Loading TensorFlow Model and Running Inference
import cv2

net = cv2.dnn.readNetFromTensorflow('frozen_inference_graph.pb', 'graph.pbtxt')
image = cv2.imread('image.jpg')
blob = cv2.dnn.blobFromImage(image, size=(300, 300), swapRB=True)
net.setInput(blob)
output = net.forward()

# Process detections...

cv2.imshow('Output', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Optimizing Deep Learning Models
Techniques like quantization, pruning, and model conversion to speed up inference in OpenCV’s DNN module.

Example: Using OpenVINO for Acceleration
# Sample command to convert and optimize a model with OpenVINO CLI (outside Python)
# mo.py --input_model model.pb --output_dir optimized_model

# In Python, load optimized model as usual with cv2.dnn.readNet()
      

Preparing Custom Datasets
Collect and annotate images with bounding boxes or masks to train custom detection or segmentation models.

Example: Creating Annotations with LabelImg
# Install LabelImg via pip:
# pip install labelImg

# Run labelImg to annotate images and save XML files in Pascal VOC format
      

Transfer Learning Overview
Use pretrained deep learning models and fine-tune them on smaller custom datasets for faster training.

Example: Using MobileNet as Feature Extractor
# Transfer learning often done in frameworks like TensorFlow or PyTorch,
# but OpenCV can load the resulting models for inference.

# Load fine-tuned model in OpenCV for inference:
net = cv2.dnn.readNet('fine_tuned_model.pb')
      

GPU Acceleration Overview
Speed up OpenCV operations using CUDA or OpenCL-enabled devices for real-time performance.

Example: Using CUDA with OpenCV
import cv2

# Check if CUDA is available
print(cv2.cuda.getCudaEnabledDeviceCount())

# Upload image to GPU memory
img = cv2.imread('image.jpg')
gpu_img = cv2.cuda_GpuMat()
gpu_img.upload(img)

# Perform Gaussian blur on GPU
gpu_blurred = cv2.cuda.createGaussianFilter(gpu_img.type(), -1, (15,15), 0).apply(gpu_img)
blurred = gpu_blurred.download()

cv2.imshow('Blurred with CUDA', blurred)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Using OpenCV on Raspberry Pi
Deploy computer vision projects on Raspberry Pi for embedded applications.

Example: Capture Image from Pi Camera
import cv2

cap = cv2.VideoCapture(0)  # Pi camera module
ret, frame = cap.read()

if ret:
    cv2.imwrite('capture.jpg', frame)

cap.release()
      

Security and Privacy Concerns
Protect sensitive visual data, use anonymization and encryption in computer vision systems.

Example: Blurring Faces for Privacy
import cv2

face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
img = cv2.imread('group_photo.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

faces = face_cascade.detectMultiScale(gray, 1.3, 5)
for (x,y,w,h) in faces:
    roi = img[y:y+h, x:x+w]
    roi = cv2.GaussianBlur(roi, (99, 99), 30)
    img[y:y+h, x:x+w] = roi

cv2.imshow('Blurred Faces', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Cloud Services for CV
Leverage cloud platforms to run heavy CV workloads, store data, and serve applications remotely.

Example: Upload Image to AWS S3
import boto3

s3 = boto3.client('s3')
filename = 'image.jpg'
bucket_name = 'mybucket'

s3.upload_file(filename, bucket_name, filename)
      

Emerging Trends
AI advancements, edge computing, self-supervised learning, and multimodal vision systems shape the future.

Sample: Exploring Vision Transformers
# Vision Transformers (ViT) implementation usually in PyTorch or TensorFlow
# Example only conceptual:
# model = ViT(...)
# output = model(input_image_tensor)
      

Augmented Reality Basics
Overlay virtual objects on live camera feed using feature detection, pose estimation, and 3D rendering.

Example: Simple AR Marker Detection
import cv2
import numpy as np

cap = cv2.VideoCapture(0)
aruco_dict = cv2.aruco.Dictionary_get(cv2.aruco.DICT_6X6_250)
parameters = cv2.aruco.DetectorParameters_create()

while True:
    ret, frame = cap.read()
    if not ret:
        break

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    corners, ids, rejected = cv2.aruco.detectMarkers(gray, aruco_dict, parameters=parameters)

    if ids is not None:
        cv2.aruco.drawDetectedMarkers(frame, corners, ids)

    cv2.imshow('AR Markers', frame)
    if cv2.waitKey(1) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()
      

Image Segmentation Overview
Separate image into meaningful regions using thresholding, clustering, or deep learning-based methods.

Example: Simple Thresholding
import cv2

img = cv2.imread('coins.jpg', 0)
ret, thresh = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)

cv2.imshow('Thresholded Image', thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Gesture Recognition Introduction
Recognize hand or body gestures in video streams for interactive applications using contour and keypoint detection.

Example: Detect Hand Contours
import cv2

cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    _, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
    contours, _ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

    if contours:
        cv2.drawContours(frame, [max(contours, key=cv2.contourArea)], -1, (0,255,0), 3)

    cv2.imshow('Gesture Recognition', frame)
    if cv2.waitKey(1) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()
      

OCR with OpenCV and Tesseract
Extract text from images using OpenCV for preprocessing and Tesseract for recognition.

Example: Basic OCR
import cv2
import pytesseract

img = cv2.imread('text_image.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY)[1]

text = pytesseract.image_to_string(thresh)
print("Extracted Text:", text)
      

Advanced Camera Models Overview
Understand fisheye, omnidirectional, and other specialized cameras for unique imaging needs.

Example: Fisheye Undistortion
import cv2
import numpy as np

img = cv2.imread('fisheye.jpg')
DIM = img.shape[:2][::-1]

K = np.array([[300.0, 0.0, DIM[0]/2],
              [0.0, 300.0, DIM[1]/2],
              [0.0, 0.0, 1.0]])
D = np.array([-0.1, 0.01, 0.0, 0.0])

map1, map2 = cv2.fisheye.initUndistortRectifyMap(K, D, np.eye(3), K, DIM, cv2.CV_16SC2)
undistorted = cv2.remap(img, map1, map2, interpolation=cv2.INTER_LINEAR)

cv2.imshow('Undistorted Image', undistorted)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Multi-Camera Systems Overview
Use multiple synchronized cameras to capture scenes from different angles for 3D reconstruction or enhanced perception.

Example: Capture from Two Cameras
import cv2

cap1 = cv2.VideoCapture(0)  # First camera
cap2 = cv2.VideoCapture(1)  # Second camera

while True:
    ret1, frame1 = cap1.read()
    ret2, frame2 = cap2.read()

    if not ret1 or not ret2:
        break

    cv2.imshow('Camera 1', frame1)
    cv2.imshow('Camera 2', frame2)

    if cv2.waitKey(1) & 0xFF == 27:  # ESC to quit
        break

cap1.release()
cap2.release()
cv2.destroyAllWindows()
      

OpenCV in Robotics
Use OpenCV for visual perception in robots such as object detection, navigation, and SLAM.

Example: Line Following Robot Vision
import cv2
import numpy as np

cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    _, thresh = cv2.threshold(gray, 60, 255, cv2.THRESH_BINARY_INV)

    contours, _ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

    if contours:
        c = max(contours, key=cv2.contourArea)
        cv2.drawContours(frame, [c], -1, (0,255,0), 3)

    cv2.imshow('Line Following', frame)
    if cv2.waitKey(1) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()
      

Anomaly Detection Overview
Identify unusual patterns or defects in images using statistical or ML-based methods.

Example: Simple Threshold-Based Anomaly
import cv2
import numpy as np

img = cv2.imread('image.jpg', 0)
mean, stddev = cv2.meanStdDev(img)

# Threshold anomalies above mean + 2*stddev
_, anomaly = cv2.threshold(img, mean + 2*stddev, 255, cv2.THRESH_BINARY)

cv2.imshow('Anomalies', anomaly)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

OpenCV in Drone Applications
Utilize OpenCV for aerial imagery analysis, navigation, and obstacle avoidance in drones.

Example: Video Feed from Drone Camera
import cv2

# Connect to drone camera stream (example URL)
cap = cv2.VideoCapture('http://192.168.1.1:8080/video')

while True:
    ret, frame = cap.read()
    if not ret:
        break

    cv2.imshow('Drone Camera', frame)
    if cv2.waitKey(1) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()
      

Explainable AI Overview
Make AI vision models interpretable by visualizing attention maps, feature importance, or saliency.

Example: Grad-CAM Visualization Concept
# Grad-CAM typically implemented with deep learning frameworks (e.g., PyTorch)
# Basic conceptual steps:
# 1. Forward pass input through CNN
# 2. Get gradients of target class output w.r.t. convolutional feature maps
# 3. Compute weighted sum of feature maps
# 4. Overlay heatmap on input image
      

3D Reconstruction Overview
Rebuild 3D models of scenes or objects from multiple 2D images using stereo vision and structure from motion.

Example: Stereo Image Rectification
import cv2
import numpy as np

imgL = cv2.imread('left.jpg', 0)
imgR = cv2.imread('right.jpg', 0)

stereo = cv2.StereoBM_create(numDisparities=16, blockSize=15)
disparity = stereo.compute(imgL, imgR)

cv2.imshow('Disparity', disparity)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Visual SLAM Introduction
Simultaneous Localization and Mapping (SLAM) uses camera input to build a map and localize the device.

Example: ORB Feature Detection for SLAM
import cv2

cap = cv2.VideoCapture(0)
orb = cv2.ORB_create()

while True:
    ret, frame = cap.read()
    if not ret:
        break

    keypoints = orb.detect(frame, None)
    frame = cv2.drawKeypoints(frame, keypoints, None, color=(0,255,0))

    cv2.imshow('ORB Keypoints', frame)
    if cv2.waitKey(1) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()
      

Visual Question Answering (VQA)
Combine CV and NLP to answer questions about images using deep learning models.

Example: VQA Concept
# Typically involves a pre-trained CNN for image features
# and an LSTM or Transformer for question encoding,
# combined to generate answers.

# Pseudocode:
# img_features = CNN(image)
# question_embedding = LSTM(question)
# answer = decoder(img_features, question_embedding)
      

Image Captioning Overview
Automatically generate descriptive text for images using CNNs combined with language models.

Example: Captioning Workflow
# Extract features with CNN (e.g., ResNet)
# Feed features into RNN or Transformer for sequence generation
# Output caption describes image content

# This is usually implemented in deep learning frameworks.
      

Few-Shot Learning Concepts
Learn to recognize new classes from very few examples using meta-learning and similarity-based models.

Example: Siamese Network Concept
# Two identical CNNs process image pairs
# Output measures similarity
# Used for one-shot/few-shot classification

# Typically implemented with PyTorch or TensorFlow.
      

GANs for Image Generation
Use adversarial networks to generate realistic images by training a generator and discriminator.

Example: GAN Training Loop Concept
# Generator creates fake images
# Discriminator classifies real vs fake
# Both train in competition to improve quality

# Implementation typically in deep learning frameworks.
      

Video Summarization Introduction
Extract key frames or segments to create short summaries of videos using CV and ML techniques.

Example: Frame Difference Method
import cv2

cap = cv2.VideoCapture('video.mp4')
ret, prev_frame = cap.read()
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)

while True:
    ret, frame = cap.read()
    if not ret:
        break
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    diff = cv2.absdiff(prev_gray, gray)
    _, thresh = cv2.threshold(diff, 30, 255, cv2.THRESH_BINARY)
    cv2.imshow('Frame Difference', thresh)

    prev_gray = gray
    if cv2.waitKey(30) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()
      

Visual Servoing Overview
Control robot motion using visual feedback to achieve precise positioning.

Example: Visual Servoing Concept
# Acquire image
# Detect target features
# Calculate error between current and desired feature position
# Command robot actuators to minimize error

# Implementation varies widely by robot and system.
      

Camera Calibration Overview
Correct lens distortion and estimate intrinsic/extrinsic parameters using calibration patterns.

Example: Chessboard Calibration
import cv2
import numpy as np

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
objp = np.zeros((6*9,3), np.float32)
objp[:,:2] = np.mgrid[0:9,0:6].T.reshape(-1,2)

objpoints = []
imgpoints = []

images = [...]  # List of calibration images

for fname in images:
    img = cv2.imread(fname)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    ret, corners = cv2.findChessboardCorners(gray, (9,6), None)
    if ret:
        objpoints.append(objp)
        corners2 = cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
        imgpoints.append(corners2)

ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)
      

Image Quality Metrics
Measure sharpness, noise, and contrast to assess and improve image quality.

Example: Calculate Sharpness Using Laplacian
import cv2
import numpy as np

img = cv2.imread('image.jpg', 0)
laplacian_var = cv2.Laplacian(img, cv2.CV_64F).var()
print("Sharpness (variance of Laplacian):", laplacian_var)
      

Image Stitching Overview
Combine multiple overlapping images to create a panoramic image by detecting keypoints and estimating homography.

Example: Simple Stitching Using OpenCV
import cv2

img1 = cv2.imread('left.jpg')
img2 = cv2.imread('right.jpg')

stitcher = cv2.Stitcher_create()
status, pano = stitcher.stitch([img1, img2])

if status == cv2.STITCHER_OK:
    cv2.imshow('Panorama', pano)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
else:
    print('Error during stitching')
      

Camera Pose Estimation
Estimate position and orientation of the camera relative to the scene using feature correspondences.

Example: SolvePnP for Pose Estimation
import cv2
import numpy as np

# 3D points in world coordinates
obj_points = np.array([[0,0,0],[1,0,0],[1,1,0],[0,1,0]], dtype=np.float32)

# Corresponding 2D points in image
img_points = np.array([[320,240],[400,240],[400,320],[320,320]], dtype=np.float32)

camera_matrix = np.array([[800,0,320],[0,800,240],[0,0,1]], dtype=np.float32)
dist_coeffs = np.zeros(5)

ret, rvec, tvec = cv2.solvePnP(obj_points, img_points, camera_matrix, dist_coeffs)
print("Rotation Vector:\n", rvec)
print("Translation Vector:\n", tvec)
      

Image Compression Overview
Reduce image file size using lossy and lossless compression while maintaining quality.

Example: Save JPEG with Compression Quality
import cv2

img = cv2.imread('input.png')
cv2.imwrite('output.jpg', img, [int(cv2.IMWRITE_JPEG_QUALITY), 90])
      

Image Filtering Overview
Enhance or extract features using filters such as Gaussian blur, median filter, and bilateral filter.

Example: Applying Gaussian Blur
import cv2

img = cv2.imread('image.jpg')
blurred = cv2.GaussianBlur(img, (5,5), 0)

cv2.imshow('Blurred Image', blurred)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Image Restoration Overview
Recover degraded images using techniques like deblurring, denoising, and inpainting.

Example: Image Inpainting
import cv2
import numpy as np

img = cv2.imread('damaged.jpg')
mask = cv2.imread('mask.png', 0)

restored = cv2.inpaint(img, mask, 3, cv2.INPAINT_TELEA)

cv2.imshow('Restored Image', restored)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Image Super-Resolution
Enhance low-resolution images to higher resolution using deep learning or interpolation methods.

Example: Upscale Image Using INTER_CUBIC
import cv2

img = cv2.imread('low_res.jpg')
upscaled = cv2.resize(img, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC)

cv2.imshow('Upscaled Image', upscaled)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Image Registration Overview
Align two or more images of the same scene taken at different times or viewpoints.

Example: Feature-Based Registration
import cv2

img1 = cv2.imread('img1.jpg', 0)
img2 = cv2.imread('img2.jpg', 0)

orb = cv2.ORB_create()
kp1, des1 = orb.detectAndCompute(img1, None)
kp2, des2 = orb.detectAndCompute(img2, None)

bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1, des2)
matches = sorted(matches, key=lambda x:x.distance)

img_matches = cv2.drawMatches(img1, kp1, img2, kp2, matches[:10], None, flags=2)

cv2.imshow('Matches', img_matches)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Scene Text Detection Overview
Detect and localize text in natural images for OCR or understanding.

Example: Using EAST Text Detector
# Requires OpenCV DNN module and EAST model files

import cv2
import numpy as np

net = cv2.dnn.readNet('frozen_east_text_detection.pb')
img = cv2.imread('scene.jpg')
(h, w) = img.shape[:2]

blob = cv2.dnn.blobFromImage(img, 1.0, (320,320), (123.68,116.78,103.94), True, False)
net.setInput(blob)
scores, geometry = net.forward(['feature_fusion/Conv_7/Sigmoid','feature_fusion/concat_3'])

# Postprocessing needed to extract boxes (complex, not shown here)
      

Style Transfer Overview
Apply artistic style from one image onto another using neural networks.

Example: Neural Style Transfer Concept
# Typically uses pretrained CNNs (e.g., VGG)
# Optimize output image to minimize content & style loss
# Requires deep learning frameworks

# Conceptual code:
# content_features = CNN(content_image)
# style_features = CNN(style_image)
# output = optimize(content_features, style_features)
      

Visual Data Augmentation
Enhance datasets by applying transformations like rotation, scaling, flipping to improve model robustness.

Example: Image Flipping and Rotation
import cv2

img = cv2.imread('image.jpg')
flip = cv2.flip(img, 1)  # Horizontal flip
rotated = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE)

cv2.imshow('Original', img)
cv2.imshow('Flipped', flip)
cv2.imshow('Rotated', rotated)
cv2.waitKey(0)
cv2.destroyAllWindows()
      

Advanced Segmentation
Employ deep learning and graph-based methods for precise pixel-level segmentation.

Example: U-Net Architecture Concept
# U-Net uses encoder-decoder CNN architecture
# Input image → downsampling → upsampling → pixel classification
# Requires deep learning frameworks for implementation
      

Video Object Segmentation
Track and segment objects throughout a video sequence, combining spatial and temporal data.

Example: Frame-wise Background Subtraction
import cv2

cap = cv2.VideoCapture('video.mp4')
fgbg = cv2.createBackgroundSubtractorMOG2()

while True:
    ret, frame = cap.read()
    if not ret:
        break

    fgmask = fgbg.apply(frame)
    cv2.imshow('FG Mask', fgmask)

    if cv2.waitKey(30) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()
      

Visual Question Generation (VQG)
Generate meaningful questions from images for interactive AI and education.

Example: Conceptual Pipeline
# Extract image features with CNN
# Use language models to generate questions
# Combine multimodal features for coherent question generation
      

Attention Mechanisms Overview
Focus model’s processing on important image regions dynamically for improved accuracy.

Example: Self-Attention Concept
# Calculate attention scores between image features
# Weight features based on relevance
# Integrate weighted features into prediction pipeline
      

Explainability Tools
Use saliency maps, LIME, SHAP to interpret vision AI decisions.

Example: Saliency Map Generation
# Compute gradient of output w.r.t input image pixels
# Visualize gradients to show influential regions
      

Domain Adaptation
Transfer vision models trained on one domain to work well on another without full retraining.

Example: Feature Alignment Concept
# Align feature distributions of source and target domains
# Use adversarial training or discrepancy minimization
      

Self-Supervised Learning
Learn useful image representations without labeled data using pretext tasks.

Example: Contrastive Learning Concept
# Generate augmented image pairs
# Train model to maximize agreement between pairs
# Useful for feature extraction
      

Neural Architecture Search
Automate design of neural networks for vision tasks using search algorithms.

Example: Search Space and Optimization
# Define search space of possible architectures
# Use reinforcement learning or evolutionary methods
# Evaluate performance on validation set
      

Vision Transformers Overview
Use transformer architectures adapted from NLP for image recognition and classification.

Example: Patch Embedding Concept
# Split image into fixed-size patches
# Flatten and embed patches as tokens
# Use transformer layers for classification
      

Future Trends
Expect growth in self-supervised learning, multimodal AI, real-time edge vision, and AI ethics.

Example: Emerging Applications
# Autonomous vehicles, AR/VR, medical imaging, robotics, surveillance
# Advances in hardware and algorithms will enable new possibilities