Building a Python Script That Clusters Photos Based on the Faces

Used Google photos before? You’ve probably seen this in action.

Divakar Rajesh
The Startup

--

So, what are we building?

Let’s get familiar with the module first

The face recognition module provides us the functions to load a photo, get the encodings of the faces in the photo and also compare them, saying if the two encodings match or not.

We’ll use two photos of Kylie(one with black hair and another with blonde hair) and one photo of Khloe

Running the program gives this result. Even though Kylie has different hair colors only her face data is used to decide if they are the same person. Nice!

Let’s cluster a bunch of images now, shall we?

A sample set of 25 images of 3 football players, are in the GitHub repo mentioned above, so you can also follow along!

  1. Let’s try to get all the photos that we have in the “dataset” directory

2. Now we should process each image, checking if the face in the image matches any cluster we have, if it matches, we add it to the cluster, if it doesn’t we create a new cluster

3. Putting this all together and running the file should group the images in “results” folder in the current working directory inside their own respective folders representing each cluster

(The complete code can be found in the GitHub Repo)

Yay! — we can now cluster images based on who’s in it. But wait a second, this CLI app won’t scale well if you are trying to build let’s say a desktop GUI or Android application around it. Some things that you might want to consider include:

  1. Representative Image for cluster: Promoting an image to a represent a cluster. So, every time when we are trying to check if an image belongs to a cluster, we compare it with just that one representative image and not all the images that the cluster has
  2. Parameter Tuning: Of course, this out-of-the-box solution by this library will not suit all use cases. We might want to tune parameters such as tolerance to get better results
  3. Multiple faces: And if you followed along you might also notice, that we only take the first face that we find on the image and in real photos that might not be the case
  4. Database and OOMS: Throw in a SQLite database or whatever and try to persist the data on disk rather than holding all encodings at the same time in memory like the “encodings” dictionary that we have in our script
  5. Same or Different: One nice thing that I also like about google photos is that, when it realizes some clusters have somewhat closer resemblance it suggests “Same or different person” — that asks for human input.

So, that’s it? — Wait we have something interesting to share!

Sensara’s Video Reasoning platform

We at Sensara have been building something interesting that helps identify what’s in a video. This is especially useful for an OTT platform, where recommendations and user interests can be improvised manifold if we can understand what’s there in the video that the users watch.

Banners, Trailers, Boilerplates, Tunes, Detail pages, we’ve create them all, just out of the given video.

We mine this almost in real-time from linear TV as well.

Of course that is built with a more sophisticated solution than the above script, duh! 🤭

And that is used to power recommendations and detail pages in popular D2H boxes including Airtel. This help us build enriched detail pages of people with movies they are on, link to relevant OTT apps and also future TV shows they appear on.

You can also try it in action with the “Sensy India TV Guide & Remote” & “Mi Remote controller” Android apps

Hey👋 — I’m Divakar Rajesh, a Product Engineer at Sensara — You’ve probably interacted with our products if you have tried Xiaomi Mi Remote. We’re also the default TV Guide/Launcher on some popular smart TVs including the MiTV and we also power recommendations, live TV discovery on all Airtel Smart Set-top boxes.

On a personal note😛, you can find me on Twitter and other socials as @sdivakarrajesh — Shameless plug 🤦‍♂️ — See ya..

--

--