Last week I was playing with a photomosaic composer toy project and needed a simple search by image engine. By search by image I mean searching a database for an image similar to a given one. In this tutorial I will show you how you can implement this functionality –with some obvious limitations– in an extremely simple way just by looking at an image’s color distribution.
If you are looking for feature similarity (shapes, patterns, etc.) you most likely need edge detection algorithms (linear filters or other similar methods), which give excellent results but are usually quite complicated. I suppose that’s the way most image search engines work. Alternatively this paper describes the sophisticated color-based approach used by Google’s skin-detection engine.
In many cases however, finding images with a perceptually similar color distribution can be enough.
If you are in this situation, you may get away with a very simple technique that still gives pretty good results with a minimal implementation effort. The technique is long known and widely used, but if you have no experience in image processing this step-by-step guide may be a fun and painless warm-up to the topic.
I’ll show the concept with the help of F# code, but the approach is so straightforward that you should understand it even without prior knowledge of the language.
TL;DR:
This is the high level outline of the process.
Just once, to build a database “index”:
- Create a normalized 8-bit color distribution histogram of each image in the database.
For every query:
- Create a normalized 8-bit color distribution histogram of the query image.
- Search the database for the histogram closest to the query using some probability distribution distance function.
If you are still interested in the details of each step, please read on.
Extracting an image’s color signature
Given that we want to compare images, we’ll have to transform them into something that can be easily compared. We could just compute the average color of all pixels in an image, but this is not very useful in practice. Instead, we will use a color histogram, i.e. we will count the number of pixels of each possible color.
A color histogram is created in four steps:
- Load/decode the image into an array of pixels.
- Downsample the pixels to 8-bit “truecolor” in order to reduce the color space to 256 distinct colors.
- Count the number of pixels for each given color.
- Normalize the histogram (to allow the comparison of images with different size).
1. Loading the image
This is almost trivial in most languages/framework. Here’s the F# code using the System.Windows.Media APIs:
2. Downsampling 32-bit color to 8-bit
With the help of some basic bitwise operations we reduce pixels from 32 bits down to 8. We discard the alpha channel and keep 2 bits for blue (out of the original 8), 3 for red and 3 for green (we discard the least significant bits of each color component). The result is that each pixel (being a byte) can represent one of exactly 256 colors. We obviously loose some color detail because we cannot represent all the original gradients, but having a smaller color space keeps the histogram size manageable.
Note: in general 8-bit images use a palette, i.e. every pixel value is a pointer to a color in a 256-color palette. That way the palette can be optimized to only include the most frequent color in the image. In our case the benefit would not be worth the trouble as we would need a common palette across all the images anyways (plus the above method is faster and simpler).
3. Creating the histogram
Nothing special here: we just count the number of pixels that are of a given color. The histogram is nothing more than a 256-elements array of integers (plus the image file name). You can read it like “this image has 23 “light green” pixels, 10 “dark red” pixels, etc.”
We then normalize the histogram by dividing each value by the total number of pixels so that each color amount is a float value in the 0 .. 1 range, where for ex. 0.3 means that a picture has 30% of pixels of that given color.
Comparing color histograms
Now we have a collection of histograms (the database) and a query histogram. In order to find the best matching image, we need a way to measure how similar two histograms are. In other words we need a distance function that quantifies the similarity between two histograms (and thus between two images).
You probably have noticed that a normalized histogram is in fact a discrete probability distribution. Every value is between 0 and 1 and the sum of all values is 1. This means we can use statistical “goodness of fit” tests to measure the distance between two histograms. For example the chi-squared test is one of those. We are going to use a slight variation of it, called quadratic-form distance. It is pretty effective in our case because it reduces the importance of differences between large peaks.
The test is defined as follows (p and q are the two histograms we are comparing):
[latex]distQF(p, q) = \frac{1} {2} \sum_{i=0}^n \frac{(p_i – q_i)^2} {p_i + q_i}[/latex]
the implementation is straightforward:
The more two histograms are different, the larger is the return value of this test. The test returns 0 for two identical histograms.
A more sophisticated option is the Jensen-Shannon divergence, that is a a smoothed version of the Kullback-Leibler divergence. While being more complicated, it has the interesting property that its square root is a metric, i.e. it defines a metric space (in layman’s terms, a space where the distance between two points can be measured, and where the distance A → B → C cannot be shorter than the direct distance A → B). This property is going to be useful in the next post when we’ll optimize our search algorithm.
The Kullback-Leibler and Jensen-Shannon divergences are defined as:
[latex]distKL(p, q) = \sum_{i=0}^n p_i ln \frac {p_i} {q_i}[/latex]
[latex]distJS(p, q) = \frac{1}{2} distKL(p, \frac{1}{2}(p + q)) + \frac{1}{2} distKL(q, \frac{1}{2}(p + q))[/latex]
This is the corresponding F# code:
This paper includes an interesting comparison of various distance functions.
At this point our problem is almost solved. All we have to do is iterating through all the samples measuring the distance between query and sample and selecting the histogram with the smallest distance:
Notice that I use head because I’m only interested in the best matching item. I could truncate the list at any given length to obtain n matching items in order of relevance.
Optimizing the search
Maybe you’ve noticed one detail: for every query, we need to walk the full database computing the distance function between our query histogram and each image. Not very smart. If the database contains billions of images that’s not going to be a very fast search. Also if we perform a large number of queries in a short time we are going to be in trouble.
If you expected a quick and easy answer to this issue I’m going to disappoint you. However, the good news is that this problem is very interesting, much more so than it may look at first sight. This will be the topic of the next post, where I’ll write about VP-Trees, BK-Trees, and Locality-Sensitive Hashing.
Grab the source
The complete F# source of this tutorial is available on GitHub (a whopping 132-lines of code).
Thanks to my brother Lorenzo for the review and feedback.