Link to slides for the San Francisco Django Meetup on this topic can be found here.
Just recently, I tried to solve a problem of uniquely identifying an individual based on facial recognition. I really don’t remember how I arrived at the conclusion that I should try this or what my train of thought was, but after a few days of experimenting I was able to interface with pre-packed OpenCV modules in python and get pretty good accuracy for verifying an individual using face recognition.
OpenCV is a very nice image processing library that’s primarily accessible using C++ that I was introduced to back in college through my internship at iRobot. During that time I had quite a lot of help from Nathan Koenig and Ivan Kirigin. Recently, OpenCV now has python bindings that make it incredibly easy to use, and facial recognition is included as a built-in feature. The high level concept to its usage is to create a training set of known faces, and machine learning algorithms are already in place behind the scenes to predict a match with reasonable certainty of an input image. Three main algorithms exist that are currently supported by OpenCV, each of which has their own respective trade-offs: Fisherfaces, Eigenfaces, and Local Binary Patterns. Quite honestly, I barely understand how each algorithm works, but such is the beauty behind packages and modules. You might describe me as an idiot that can write code fairly fast and therefore achieve results after sufficient iterations of the guess and check method.
What I did learn about each algorithm in very basic terms is that Fisherfaces is ideal for matching an input face to a training set. If I had to guess, Facebook probably uses some derivation of Fisherfaces for their facial recognition, and as I learned from my colleague Akshay, there’s probably some hardcore Fortran action going on with some GPU processors to deal with obscene amounts of matrix multiplication. Fisherfaces maximizes the variance between multiple known images such that an input image can most closely be resembled to its actual match. Eigenfaces relies on an algorithm in which the training set of faces creates a Principal Component Analysis matrix, an input face is is projected onto that space, and the output of that is reconstructed back into a face. The similarity between the input and output face is compared and classified appropriately. To be clear, I don’t really get it. Local Binary Patterns work by comparing the histograms of individual parts of a grayscale image. For each algorithm, a training set is required. But of note, the Fisherface and Eigenface algorithm each require re-training when any set of images is added to the training set. Local Binary Patterns, however, can be updated on the fly. Without sufficiently knowing enough about the differences between all of the algorithms, what I can say for certain is that Eigenfaces or Fisherfaces may be preferable depending on the initial training set, but the Local Binary Patterns Histograms is drastically faster (and is assuredly less accurate for facial recognition in particular).
With all that said, the particular question that’s typically addressed with these algorithms is: “Given an input face, which individual is this?” However, the question that I was trying to address is: “Given an input face, is this Scott Lobdell?” (or whatever other individual). I was trying to verify an identity rather than optimize a match. With that said, I could skimp a bit on the training set and take advantage of some assumptions that I could already make. The two primary assumptions were:
-Scott Lobdell (or whomever) is in the training set and is the person that I’m trying to conclude matches the input image
-Among the multiple images that I’m passing in, only one or zero of them is Scott Lobdell
Hence, the solution I’m presenting is drastically less sophisticated then that of, say, Facebook. What I’ve produced here is with the result of about two days worth of work using existing libraries and with the help of what is now freely available through Facebook. As such, we now need to simply collect the images necessary to produce a training set and feed input images through the algorithms available with OpenCV.
Building the Training Set
If you google Facebook facial recognition, the results will mostly link to pages that discuss the privacy concerns associated with facial recognition being performed on your Facebook photos and the evils of Facebook. After doing just a bit of dabbling in this space, this is almost laughable to me. If you don’t want facial recognition being performed on your photos, don’t post them on the internets. After very little effort, I was able to put together a small script that builds a training set for you based on all of your friends’ photos. If you just download the file, install the OpenCV dependencies, and get a Facebook Access Token you can kick off a script within a few minutes that will build a training set.
The above image is the set of pictures that the script picked up just for me in a 100% automated fashion, but I have this same set of data for hundreds of Facebook friends, and the set returned for me was actually fairly small. For some individuals, the number of photos exceeded several hundred.
You can run the script with the above steps without any additional steps, but basically what’s happening is that each photo your friends are tagged in is downloaded and analyzed. The Facebook API returns the x and y coordinates of your tagged friend’s face. This script does facial detection on the image and stores a face if that face matches with the tagging information associated with the Facebook API. This ensures that we only use the one face we care about in the image, and it also ensures that the coordinates that were tagged is in fact a face that we can use. If a face is found, that face is then normalized for later by resizing it to 100 by 100 pixels, converting it to grayscale, and then normalizing the histogram in the image such that the entire grayscale spectrum from black to white is used. That is, there are pixels values from 0 all the way to 255. A folder is created for each friend and is then populated with that friend’s images of their face. For my data at least, this worked quite well. For a small percentage of the time, a face might be captured where someone used a baby picture and then tagged themselves or a face of a celebrity was used that was tagged or something of the sort, but this was still a minority case.
I mimicked the above process for profile photos as well. However, the common case with profile pictures is that a single picture will exist of the individual that usually is not tagged. It’s also common for a couple picture or a family photo or something of the sort. Therefore, for profile pictures I used the same process but ignored tagging. If only one images was returned, then it was fairly safe to assume that the face was that of the alleged individual.
Matching the Faces
OpenCV does provide a simple methodology to simply create a training set and then try to predict from an input face, but my goal was to match an input face against someone to whom we think this person is. The OpenCV docs for facial recognition can be found here, but I was able to achieve better than the out of the box results by adding my known constraints based on the scenario I described. I have a GitHub Repo that has all of the code I used. Simply run match_faces.py after building your training set to see the results.
I believe there’s a fancy statistical term to describe what it is I did, but I basically leveraged my known constraints by using the Local Binary Patterns Histograms algorithm since it can build a training set very fast relative to the other two algorithms. Whereas a sufficient training set would probably need to be learned in advance for the other algorithms, LBPH is fast enough to where it can be done in real time. Our training data was collected in a 100% automated fashion, so it might not be ideal, and LBPH is generally less accurate than the other two algorithms, but we can assuredly take advantage of the fact that we know the exact face we’re trying to match against. As such, my solution was to train the LBPH algorithm three separate times (since it’s very fast), and train the algorithm each time with my known face plus eight other random faces that I had (I just played the the numbers until I got to 8 to give me good results). From there, I only considered an input face a match to the target face if all three training algorithms matched to the intended individual. In this way, if the algorithm was completely random (which it was not), there was only a 1 in 729 chance that the algorithm guessed correctly, and therefore it was unlikely that a false positive would be produced. Otherwise, if the individual was matched in all three training sets using a facial recognition algorithm, it would be fairly safe to say that the prediction was correct. Otherwise, using this methodology it was impossible to match the entirely wrong face in the training set because each training set had their own respective unique individuals that did not match the other sets.
With this methodology, I tried 300 different iterations. In each iteration, I took one face that was not in the training set that indeed matched correctly to an individual in the training set. I added ten other random faces to the mix. So for each iteration, there were eleven individuals that may or may not be the intended target. So, as an example using myself, Scott Lobdell, in each iteration I wanted to know which of eleven different random faces was Scott Lobdell. The actual pictures of Scott Lobdell were in the training set. For the vast majority of the iterations, I was able to decisively conclude that I could either verify a face, or I did not know, meaning that the number of false positives was very low (in 300 iterations, I had 8 false positives, so about 3%). Of those conclusions, 32% were accurately matched. The remaining 68% were inconclusive, which was acceptable for my purposes. Of the false positives, the random person in the training set generally really, really looked like the target individual. So the false positives could be considered as inaccurate as a human would be to begin with, and it was otherwise interesting to see which of my Facebook friends looked very very similar.
I could have played with the numbers and thresholds a bit to either achieve slightly better results, or I could have sacrificed some additional false positives in order to gain more matches. For my purposes, the thresholds I set were about ideal.
I had my reasons for procuring this data, but I’m sure there are a vast number of other uses for it. One project that I’ve really wanted to take on is to make it so that as employees of Hearsay Social walk into the office, a 5 second clip of their respective theme music starts playing across the office’s Sonos system, much like a baseball player getting up to bat in a stadium. For this case, a training set specifically tuned for people walking into the office would be more ideal, and I’m sure that this would get everyone routinely amped up on a daily basis.