In my Advanced Topics in Computer Science class at school, we recently implemented k-Means Image Segmentation. The algorithm works by partitioning the dataset into k non-overlapping subgroups, or clusters. In this case, the dataset would be the set of pixels of an image (as we are performing image segmentation, or the process of breaking an image up into different sections). We are doing image segmentation based on color (on the R, G, B values), so our clusters would essentially be pixels that have the most similar colors.
Here’s a brief overview. See this link for more details.
- Set a number of k clusters.
- Initialize the k centroids (or cluster centers) by randomly selecting k points from the shuffled set of pixels.
- At each iteration of the algorithm,
- Compute the sum of the squared distance between all data points and all centroids.
- Determine, for each pixel, which centroid is closest.
- Assign that pixel to the corresponding (closest) cluster.
- Re-assign each cluster’s center (i.e. re-compute the centriod) by averaging all of the data points in a cluster. With pixels, this means, for all pixels in a cluster, average the X-position and the Y-position. The centroid location will then be (Xavg, Yavg).
- Stop iterations after a specified number has passed, or a certain error threshold has been reached, etc. You can set any end condition, just know that k-Means is an iterative algorithm, and it is in the programmer’s hands to terminate it.
You can find the code here, on my GitHub. In the meantime, enjoy the segmentations and the analysis at the end!
Some key insights:
The k-Means algorithm favors classifying different levels of shading (the colors that represent them) rather than classifying distinctly different colors. I had originally thought this may be to incorporate detail, but going back to the steps of the algorithm and analyzing them revealed that it is really just that the parts of the spectrum of shaded colors are more common than spots of different, vibrant, eye-catching colors. For instance, the shaded parts cover a greater area than do the blue pixels in the small blue eyes in the matryoshka dolls, and therefore, they are more likely to be initially picked as a color. However, if you manually set the starting pixel to be that small blue region, though, that color would be captured (albeit covering a very small portion of the segmented image).
The algorithm runs faster when images cover a smaller area of pixels, as would be expected. And, some image-specific observations: note the good results on the FIALKA image with k = 4 (i.e., four clusters) – the texture and 3D aspect of the photo is really captured well. For the image of the person in Moscow fog, with St. Basil’s Cathedral in the background, the sky is separated into lighter and darker parts. You can see this gradation in the original image as well, but it is definitely not as distinct as the classification would suggest (it’s actually much more gradual).
Hope you enjoyed reading! As always, let me know if you have any questions/thoughts in the comments. До скорого, Рая!