CS 180 Project 4 - Image Warping and Mosaicing by Eshani Jha

To align images into a seamless mosaic, I implemented a custom warping function that leverages a homography matrix to map the source image into the target perspective. The homography matrix transforms the corners of the source image, defining the new bounds and creating a pixel grid for the warped image. Using inverse mapping, each pixel in the target image is then traced back to its corresponding location in the source image. At these locations, I applied bilinear interpolation to compute smooth pixel values by blending the intensities of neighboring pixels. This ensures that the transitions between images are visually smooth and free of harsh boundaries.

The final warped images preserve spatial relationships and aligns accurately with the target image, setting the stage for blending multiple images into a unified mosaic.

Image Rectification

As part of testing the homography and warping functions, I performed image rectification on objects with known shapes. This process ensures that the code works correctly by transforming tilted or distorted objects back into their proper rectangular form. I captured images with objects such as my laptop screen and LED remote at an angle and used the homography transformation to "flatten" their appearance.

Interestingly, when rectifying the remote control, the transformation not only corrected the perspective but also revealed finer details that were initially hard to see. Surprisingly, symbols like the AUTO button and the W symbol (for white light) became clearly visible after rectification. This demonstrates the power of perspective correction in uncovering subtle features within an image.

Results

Blending into a Mosaic

First, I computed a homography matrix to warp one image onto the plane of another. Then, I calculated the bounding box and necessary shifts to align the warped image with the target image. Finally, both images were stitched together within the mosaic by placing them at their computed positions in the combined frame.

I applied this process to create a mosaic of stuffed animals (real-world example) and used blending techniques to eliminate sharp edges. I also experimented with Dota 2 screenshots to explore how parts of a video game map look on a larger scale, as the in-game view is limited. I tested the process on different landscapes, including the “lane,” which contains mostly similar textures, and the “trees,” which have intricate leaf details.

Real-Life Results

Game Results (Lane)

Game Results (Trees)

Detecting Corner Features

In Part A, I manually selected correspondence points to compute the homography matrix. Later, I automated this process by implementing the Harris Corner Detector, which identifies interest points, or corners, within an image. A corner is a point where two edges meet, creating a significant change in brightness, and Harris corners are reliable for feature detection as they are invariant to translation, rotation, and illumination changes.

The detector calculates the structure tensor \( M \) using image derivatives in the x and y directions. Each window is scored using \( R = \det(M) - k(\text{trace}(M))^2 \), where the eigenvalues of \( M \) determine if the region contains a corner, edge, or flat area. I applied the Harris detector to both the left and right images to extract key corner features automatically.

Results

Adaptive Non-Maximal Suppression (ANMS)

While the Harris Corner Detector identifies many potential feature points, it often produces too many. To address this, Adaptive Non-Maximal Suppression (ANMS) filters the points by distributing them evenly across the image, keeping only the strongest features. ANMS calculates a suppression radius for each point defined as the smallest distance to a significantly stronger point using a robustness parameter \(c = 0.9\).

The algorithm works by computing the Euclidean distance and strength differences between each point and others. Points with non-positive strength differences are discarded. The nearest valid distance is stored in a min-heap, which maintains the largest suppression radii. While ANMS may not return the points with the highest strengths, it ensures that the selected features are spatially well-distributed, providing strong, dominant points across the image. The final set of Harris points is shown below.

Extracting Feature Descriptors

With a reduced set of Harris points, the next step is to match points between images using feature descriptors. Each descriptor is created by extracting a 40x40 window centered on a Harris point. This patch captures a 20-pixel radius around the point in all directions. To generate a smooth, blurred descriptor, the patch is downsampled to 8x8 pixels instead of extracting a smaller patch directly.

The descriptors are then normalized to have a mean of 0 and a standard deviation of 1, making them robust to affine changes in intensity (e.g. bias and gain). These normalized feature descriptors allow consistent and reliable matching between images.

Matching Feature Descriptors

To match feature descriptors, we compare all descriptor pairs between the two images using Sum of Squared Differences (SSD) or Nearest-Neighbor (NN) scores, where lower scores indicate better matches. We first narrow the candidates by selecting each feature’s nearest neighbor. Next, we calculate the ratio between the nearest-neighbor score and the second-nearest-neighbor score. If this ratio is below a chosen threshold (0.4 worked well in my testing), we accept the match. The resulting matches are visualized with randomly colored points to highlight correspondences between the images.

Results

Even though some mismatches remain, the majority of the matches are accurate and reliable.

RANSAC

To eliminate false positives from the feature matching process, we apply RANSAC. In each iteration, a random set of 4 matches is selected to compute a homography. We then evaluate how well this homography aligns the remaining matches, counting the number of inliers it produces. The homography with the highest number of inliers is selected as the final result. While some true matches may be lost, most correct matches are retained. Also, all incorrect matches are removed which leads to more reliable stitching.

Results

Blending into a Mosaic

I used the same techniques as Part A. First, I computed a homography matrix to warp one image onto the plane of another. Then, I calculated the bounding box and necessary shifts to align the warped image with the target image. Finally, both images were stitched together within the mosaic by placing them at their computed positions in the combined frame.

Real-Life Results

Below are my manually and automatically stitched results side by side. They both appear to be similar.

Game Results (Lane)

Below are my manually and automatically stitched results side by side. They both appear to be similar.

Game Results (Trees)

Below are my manually and automatically stitched results side by side. The automatic stitching noticeably outperformed the manual approach, as evidenced by the absence of a shadow artifact along the bottom edge in the automatically stitched image (around 650 on the horizontal axis). This comparison highlights the precision and effectiveness of the automated process.

What I learned

The most mind-blowing part of this project has been discovering the power of feature matching based solely on pixel intensities. I had always assumed that recognizing similar points across images required a neural network or some form of “smart” AI to make sense of it all. But seeing how pixel intensities alone can accurately pinpoint matching features between images was eye-opening! It felt like unlocking a new layer of understanding in computer vision—realizing that the simplest data can sometimes yield the most precise results.

CS 180 Project 4 - Image Warping and Mosaicing by Eshani Jha

Overview

Shoot the Pictures

Recover Homographies

Warp the Images

Image Rectification

Results

Blending into a Mosaic

Real-Life Results

Game Results (Lane)

Game Results (Trees)

Detecting Corner Features

Results

Adaptive Non-Maximal Suppression (ANMS)

Extracting Feature Descriptors

Matching Feature Descriptors

Results

RANSAC

Results

Blending into a Mosaic

Real-Life Results

Game Results (Lane)

Game Results (Trees)

What I learned