A superior edge detection gradient operator

John P. Costella

Ashwood, Melbourne, Australia

(February 26, 2011)

Abstract

Many edge detection algorithms use the Sobel, Scharr, Prewitt, or Roberts-cross operator for estimating the components of the gradient of the image. I introduce a new gradient operator that has resolution and fidelity that is superior to these operators.

Introduction and motivation

Estimating the gradient of an signal is a fundamental task in any field of numerical computing. If the signal is sampled on a regular lattice, as it is in pixmaps or voxmaps, then the first-principles definition of differentiation leads to the naive solution of simply taking the first-difference of the signal in the direction desired. For example, if s(i) is a signal on a one-dimensional lattice, where i = 0, 1, 2, ... , and c is the lattice spacing, then [s(i+1) - s(i)] / c is a naive estimate of the derivative (gradient).

The problem arises when there is more than one dimension, as in the practically important case of pixmap images. For example, if we want to combine the two components of the gradient of a two-dimensional image—say, to find the magnitude of the gradient, or its direction—we need to know precisely where those gradient estimates are located. Consider: using the naive definition, the gradient in the x-direction can be defined as Gx = [s(i+1, j) - s(i, j)] / c, and the gradient in the y-direction can be defined as Gy = [s(i, j+1) - s(i, j)] / c. (Let’s assume that the lattice spacing is the same, c, in each direction.) The problem is that Gx is estimating the x-gradient half-way between columns i and i+1, and Gy is estimating the y-gradient half-way between rows j and j+1. In other words, the naive Gx and Gy are on different spatial lattices, and so cannot be directly combined.

The standard solution to this problem is to use the “central difference” operator, [s(i+1) - s(i-1)] / 2c, in each direction, for which the gradient estimate is located right on the central lattice site i. We can then combine x and y gradients, as they are defined on the same lattice. (It is understood that one row or column is omitted from each edge of the pixmap, as no signal is available for pixels outside the pixmap.)

The problem now, however, is that the response of this operator to sharp edges has been smudged out across three lattice sites. If a similar blurring is not applied in the orthogonal direction, then the x and y components of sharp corners will fail to be correctly co-located.

The standard solution to this problem is to use Sobel blurring kernel, which is a kernel of {1/4, 1/2, 1/4} in the orthogonal direction. (The Sobel operator is the combination of the Sobel blurring kernel and the central difference operator.) More recent variants are the Scharr blurring kernel, {3/16, 5/8, 3/16}, which improves rotational invariance, and the Prewitt blurring kernel of {1/3, 1/3, 1/3}.

A more ingenious method is the Roberts cross, which effectively looks at the pixmap on a 45-degree angle, and uses the naive difference operator. Defining G+ = [s(i+1, j+1) - s(i, j)] / c√2 and G- = [s(i+1, j) - s(i+1, j)] / c√2, we see that G+ measures the gradient in the x = y direction, and G- measures it in the x = -y direction. More importantly, both are defined on lattices that are half-way between the original lattice positions in both, and so they can be combined together.

These historical solutions suffer from the fact that they have lower inherent resolution than the pixmap they are differencing: the Sobel, Scharr, and Prewitt operators have half the resolution in each direction, and the Roberts cross has 1/√2 of the resolution in each direction. Fourier theory tells us that components of the original image that have a higher frequency than these cutoffs will be aliased in the gradient image. The net result is that they have lower resolution than the original image, as well as aliasing artefacts.

The new gradient operator

The gradient operator that I will now put forward returns to the original naive first-difference gradient operator, and solves its half-lattice-site offset problem in a different way.

The key to the solution is the magic kernel upsampling operator, which yields results that are superior to conventional kernels such as bicubic or Lanczos, yet is lightning fast. Most importantly, the magic kernel upsampler tiles exactly the same lattice as the original lattice, with twice as many tiles in each direction (i.e. the centers of the upsampled lattice tiles are not coincident with the centers of the original lattice tiles).

Thus, if we use the naive differencing operator in the x-direction, upsample once (in both directions) using the magic kernel, and discard the top and bottom rows of the result, we obtain an x-gradient pixmap of twice the resolution of the original image, in which the lattice positions are identical to those of the y-gradient pixmap obtained in the same manner.

The results are best appreciated with a practical example.

Practical example

Consider the following image, taken from the Wikipedia pages on edge detection:

original image

For visual purposes only, we can upsample this original image by a factor of 2 using the magic kernel:

original image, upsampled

The magnitude of the gradient of the original image estimated by the new operator is:

magnitude of gradient

The Sobel operator yields (upsampled to the same scale for visual purposes):

Sobel magnitude of gradient, upsampled

The Scharr operator yields:

Scharr magnitude of gradient, upsampled

The Prewitt operator yields:

Prewitt magnitude of gradient, upsampled

Finally, the Roberts-cross operator yields:

Roberts-cross magnitude of gradient, upsampled

A more direct visual comparison can be made by focusing on the portion of this image with the most detail:

Original crop   Original
New operator crop   Proposed
Sobel crop   Sobel
Scharr crop   Scharr
Prewitt crop   Prewitt
Roberts-cross crop   Roberts-cross

Zooming in to the “Uptown” label (800% original size):

Original blowup   Original
New operator blowup   Proposed
Sobel blowup   Sobel
Scharr blowup   Scharr
Prewitt blowup   Prewitt
Roberts-cross blowup   Roberts-cross

It can be seen that the Sobel, Scharr, and Prewitt operators suffer significantly, both from the orthogonal blurring kernel that they employ, yielding both poorer resolution, as well as significant aliasing artifacts between closely adjacent edges (particularly the cross-hatching artifact) due to the deficient Fourier properties of the central difference operator. The performance of the Roberts-cross operator is slightly superior to these operators but inferior to the new operator, reflecting its intermediate resolution.

Source code

The full C# source code for the above analysis is contained here (25.1 KiB).