A superior edge detection gradient operator

John P. Costella

Ashwood, Melbourne, Australia (2010–11); Washington, DC, United States (2021–22)

(posted February 26, 2011; updated April 6, 2021)

Abstract

Many edge detection algorithms use the Sobel, Scharr, Prewitt, or Roberts-cross operator for estimating the components of the gradient of the image. I introduce a new gradient operator that has resolution and fidelity that is superior to these operators.

Introduction and motivation

Estimating the gradient of an signal is a fundamental task in any field of numerical computing. If the signal is sampled on a regular lattice, as it is in pixmaps or voxmaps, then the first-principles definition of differentiation leads to the naive solution of simply taking the first-difference of the signal in the direction desired. For example, if s(i) is a signal on a one-dimensional lattice, where i = 0, 1, 2, ... , and c is the lattice spacing, then [s(i+1) - s(i)] / c is a naive estimate of the derivative (gradient).

The problem arises when there is more than one dimension, as in the practically important case of pixmap images. For example, if we want to combine the two components of the gradient of a two-dimensional image—say, to find the magnitude of the gradient, or its direction—we need to know precisely where those gradient estimates are located. Consider: using the naive definition, the gradient in the x-direction can be defined as Gx = [s(i+1, j) - s(i, j)] / c, and the gradient in the y-direction can be defined as Gy = [s(i, j+1) - s(i, j)] / c. (Let’s assume that the lattice spacing is the same, c, in each direction.) The problem is that Gx is estimating the x-gradient half-way between columns i and i+1, and Gy is estimating the y-gradient half-way between rows j and j+1. In other words, the naive Gx and Gy are on different spatial lattices, and so cannot be directly combined.

The standard solution to this problem is to use the “central difference” operator, [s(i+1) - s(i-1)] / 2c, in each direction, for which the gradient estimate is located right on the central lattice site i. We can then combine x and y gradients, as they are defined on the same lattice. (It is understood that one row or column is omitted from each edge of the pixmap, as no signal is available for pixels outside the pixmap.)

The problem now, however, is that the response of this operator to sharp edges has been smudged out across three lattice sites. If a similar blurring is not applied in the orthogonal direction, then the x and y components of sharp corners will fail to be correctly co-located.

The standard solution to this problem is to use Sobel blurring kernel, which is a kernel of {1/4, 1/2, 1/4} in the orthogonal direction. (The Sobel operator is the combination of the Sobel blurring kernel and the central difference operator.) More recent variants are the Scharr blurring kernel, {3/16, 5/8, 3/16}, which improves rotational invariance, and the Prewitt blurring kernel of {1/3, 1/3, 1/3}.

A more ingenious method is the Roberts cross, which effectively looks at the pixmap on a 45-degree angle, and uses the naive difference operator. Defining G+ = [s(i+1, j+1) - s(i, j)] / c√2 and G- = [s(i+1, j) - s(i+1, j)] / c√2, we see that G+ measures the gradient in the x = y direction, and G- measures it in the x = -y direction. More importantly, both are defined on lattices that are half-way between the original lattice positions in both, and so they can be combined together.

These historical solutions suffer from the fact that they have lower inherent resolution than the pixmap they are differencing: the Sobel, Scharr, and Prewitt operators have half the resolution in each direction, and the Roberts cross has 1/√2 of the resolution in each direction. Fourier theory tells us that components of the original image that have a higher frequency than these cutoffs will be aliased in the gradient image. The net result is that they have lower resolution than the original image, as well as aliasing artifacts.

The new gradient operator

The gradient operator that I will now put forward returns to the original naive first-difference gradient operator, and solves its half-lattice-site offset problem in a different way.

The key to the solution is to use the gold standard Magic Kernel Sharp algorithm to “double” the image, i.e. upsample it with twice as many pixel tiles in each dimension as the original, with the default behavior that the left edge of the first upsampled pixel tile aligns with that of the first original pixel tile, etc.; i.e. the upsampled pixel centers are not coincident with the original pixel centers. It is straightforward to see that if we use the naive differencing operator in the x-direction, double once (in both directions) using Magic Kernel Sharp, and then discard the top and bottom rows of the result, we obtain an x-gradient pixmap of twice the resolution of the original image, in which the lattice positions are identical to those of the y-gradient pixmap obtained in the same manner.

The results are best appreciated with a practical example.

Practical example

Consider the following image, taken from the Wikipedia pages on edge detection:

Original image

For visual comparison purposes only, we can upsample this original image by a factor of 2 using Magic Kernel Sharp:

Upsampled by a factor of 2

Original image upsampled by a factor of 2

The magnitude of the gradient of the original image estimated by the proposed operator is:

Result of the proposed operator

Proposed operator

The Sobel operator yields (upsampled to the same scale for visual purposes):

Result of the Sobel operator

Sobel operator

The Scharr operator yields:

Result of the Scharr operator

Scharr operator

The Prewitt operator yields:

Result of the Prewitt operator

Prewitt operator

Finally, the Roberts-cross operator yields:

Result of the Roberts-cross operator

Roberts-cross operator

A more direct visual comparison can be made by focusing on the portion of this image with the most detail:

Detailed area of original image

Original

Detailed area of the proposed operator result

Proposed

Detailed area of the Sobel operator result

Sobel

Detailed area of the Scharr operator result

Scharr

Detailed area of the Prewitt operator result

Prewitt

Detailed area of the Roberts-cross operator result

Roberts-cross

Zooming in on the “Uptown” label (800% original size):

Zoom of the Uptown label in the original

Original

Zoom of the Uptown label in the proposed operator result

Proposed

Zoom of the Uptown label in the Sobel operator result

Sobel

Zoom of the Uptown label in the Scharr operator result

Scharr

Zoom of the Uptown label in the Prewitt operator result

Prewitt

Zoom of the Uptown label in the Roberts-cross operator result

Roberts-cross

It can be seen that the Sobel, Scharr, and Prewitt operators suffer significantly, both from the orthogonal blurring kernel that they employ, yielding both poorer resolution, as well as significant aliasing artifacts between closely adjacent edges (particularly the cross-hatching artifact) due to the deficient Fourier properties of the central difference operator. The performance of the Roberts-cross operator is slightly superior to these operators but inferior to the new operator, reflecting its intermediate resolution.

Reference ANSI C implementation of the Magic Edge Detector

The reference implementation of the Magic Edge Detector described above is contained in the following ANSI C code:

edge-detect.tar.gz (232 KB; MD5: ef6a986563b0b000bc79b9365ee488bc)

There is one sample program included:

detect_edges: Detect edges in a JPEG or PNG image, and write the magnitudes of the edges out as an image file.

There are also 80 executables of unit tests and death tests provided.

Instructions for building the code are contained in the _README file within the archive.

Note that all code provided here is from my personal codebase, and is supplied under the MIT License.

Other image processing resources

If you like the Magic Edge Detector, then you may also find useful the following image processing resources that I have put into the public domain over the decades:

The Magic Kernel: World-beating resizing algorithm, superior to the popular Lanczos kernels, that helps power Facebook and Instagram (2006–2021).
JPEG-Clear: A top-down progressive image (mipmapping) algorithm, leveraging the power of Magic Kernel Sharp (2006–2021).
3D JPEG-Clear: The same algorithm as JPEG-Clear, but applied to 3D volumetric data (2010–2021).
UnBlock: Algorithm for removing the “blockies” from images and videos without the need for any parameter tuning (2005–2021).
UnBlur: Position-space deblurring algorithm (1999–2021).