Link to slides for the San Francisco Django Meetup on this topic can be found here.

Just recently, I tried to solve a problem of uniquely identifying an individual based on facial recognition. I really don’t remember how I arrived at the conclusion that I should try this or what my train of thought was, but after a few days of experimenting I was able to interface with pre-packed OpenCV modules in python and get pretty good accuracy for verifying an individual using face recognition.

OpenCV is a very nice image processing library that’s primarily accessible using C++ that I was introduced to back in college through my internship at iRobot. During that time I had quite a lot of help from Nathan Koenig and Ivan Kirigin. Recently, OpenCV now has python bindings that make it incredibly easy to use, and facial recognition is included as a built-in feature.  The high level concept to its usage is to create a training set of known faces, and machine learning algorithms are already in place behind the scenes to predict a match with reasonable certainty of an input image.  Three main algorithms exist that are currently supported by OpenCV, each of which has their own respective trade-offs: Fisherfaces, Eigenfaces, and Local Binary Patterns. Quite honestly, I barely understand how each algorithm works, but such is the beauty behind packages and modules. You might describe me as an idiot that can write code fairly fast and therefore achieve results after sufficient iterations of the guess and check method.

What I did learn about each algorithm in very basic terms is that Fisherfaces is ideal for matching an input face to a training set.  If I had to guess, Facebook probably uses some derivation of Fisherfaces for their facial recognition, and as I learned from my colleague Akshay, there’s probably some hardcore Fortran action going on with some GPU processors to deal with obscene amounts of matrix multiplication. Fisherfaces maximizes the variance between multiple known images such that an input image can most closely be resembled to its actual match. Eigenfaces relies on an algorithm in which the training set of faces creates a Principal Component Analysis matrix, an input face is is projected onto that space, and the output of that is reconstructed back into a face. The similarity between the input and output face is compared and classified appropriately. To be clear, I don’t really get it. Local Binary Patterns work by comparing the histograms of individual parts of a grayscale image. For each algorithm, a training set is required. But of note, the Fisherface and Eigenface algorithm each require re-training when any set of images is added to the training set. Local Binary Patterns, however, can be updated on the fly. Without sufficiently knowing enough about the differences between all of the algorithms, what I can say for certain is that Eigenfaces or Fisherfaces may be preferable depending on the initial training set, but the Local Binary Patterns Histograms is drastically faster (and is assuredly less accurate for facial recognition in particular).

With all that said, the particular question that’s typically addressed with these algorithms is: “Given an input face, which individual is this?”  However, the question that I was trying to address is: “Given an input face, is this Scott Lobdell?”  (or whatever other individual).  I was trying to verify an identity rather than optimize a match.  With that said, I could skimp a bit on the training set and take advantage of some assumptions that I could already make.  The two primary assumptions were:

-Scott Lobdell (or whomever) is in the training set and is the person that I’m trying to conclude matches the input image

-Among the multiple images that I’m passing in, only one or zero of them is Scott Lobdell

Hence, the solution I’m presenting is drastically less sophisticated then that of, say, Facebook.  What I’ve produced here is with the result of about two days worth of work using existing libraries and with the help of what is now freely available through Facebook.  As such, we now need to simply collect the images necessary to produce a training set and feed input images through the algorithms available with OpenCV.

# Building the Training Set

The above image is the set of pictures that the script picked up just for me in a 100% automated fashion, but I have this same set of data for hundreds of Facebook friends, and the set returned for me was actually fairly small.  For some individuals, the number of photos exceeded several hundred.

You can run the script with the above steps without any additional steps, but basically what’s happening is that each photo your friends are tagged in is downloaded and analyzed.  The Facebook API returns the x and y coordinates of your tagged friend’s face.  This script does facial detection on the image and stores a face if that face matches with the tagging information associated with the Facebook API.  This ensures that we only use the one face we care about in the image, and it also ensures that the coordinates that were tagged is in fact a face that we can use.  If a face is found, that face is then normalized for later by resizing it to 100 by 100 pixels, converting it to grayscale, and then normalizing the histogram in the image such that the entire grayscale spectrum from black to white is used.  That is, there are pixels values from 0 all the way to 255.  A folder is created for each friend and is then populated with that friend’s images of their face.  For my data at least, this worked quite well.  For a small percentage of the time, a face might be captured where someone used a baby picture and then tagged themselves or a face of a celebrity was used that was tagged or something of the sort, but this was still a minority case.

I mimicked the above process for profile photos as well.  However, the common case with profile pictures is that a single picture will exist of the individual that usually is not tagged.  It’s also common for a couple picture or a family photo or something of the sort.  Therefore, for profile pictures I used the same process but ignored tagging.  If only one images was returned, then it was fairly safe to assume that the face was that of the alleged individual.

# Matching the Faces

OpenCV does provide a simple methodology to simply create a training set and then try to predict from an input face, but my goal was to match an input face against someone to whom we think this person is.  The OpenCV docs for facial recognition can be found here, but I was able to achieve better than the out of the box results by adding my known constraints based on the scenario I described. I have a GitHub Repo that has all of the code I used. Simply run match_faces.py after building your training set to see the results.

I believe there’s a fancy statistical term to describe what it is I did, but I basically leveraged my known constraints by using the Local Binary Patterns Histograms algorithm since it can build a training set very fast relative to the other two algorithms.  Whereas a sufficient training set would probably need to be learned in advance for the other algorithms, LBPH is fast enough to where it can be done in real time.  Our training data was collected in a 100% automated fashion, so it might not be ideal, and LBPH is generally less accurate than the other two algorithms, but we can assuredly take advantage of the fact that we know the exact face we’re trying to match against.  As such, my solution was to train the LBPH algorithm three separate times (since it’s very fast), and train the algorithm each time with my known face plus eight other random faces that I had (I just played the the numbers until I got to 8 to give me good results).  From there, I only considered an input face a match to the target face if all three training algorithms matched to the intended individual.  In this way, if the algorithm was completely random (which it was not), there was only a 1 in 729 chance that the algorithm guessed correctly, and therefore it was unlikely that a false positive would be produced.  Otherwise, if the individual was matched in all three training sets using a facial recognition algorithm, it would be fairly safe to say that the prediction was correct.  Otherwise, using this methodology it was impossible to match the entirely wrong face in the training set because each training set had their own respective unique individuals that did not match the other sets.

# The Results

With this methodology, I tried 300 different iterations.  In each iteration, I took one face that was not in the training set that indeed matched correctly to an individual in the training set.  I added ten other random faces to the mix.  So for each iteration, there were eleven individuals that may or may not be the intended target.  So, as an example using myself, Scott Lobdell, in each iteration I wanted to know which of eleven different random faces was Scott Lobdell.  The actual pictures of Scott Lobdell were in the training set.  For the vast majority of the iterations, I was able to decisively conclude that I could either verify a face, or I did not know, meaning that the number of false positives was very low (in 300 iterations, I had 8 false positives, so about 3%).  Of those conclusions, 32% were accurately matched.  The remaining 68% were inconclusive, which was acceptable for my purposes.  Of the false positives, the random person in the training set generally really, really looked like the target individual.  So the false positives could be considered as inaccurate as a human would be to begin with, and it was otherwise interesting to see which of my Facebook friends looked very very similar.

I could have played with the numbers and thresholds a bit to either achieve slightly better results, or I could have sacrificed some additional false positives in order to gain more matches.  For my purposes, the thresholds I set were about ideal.

# Practicality

I had my reasons for procuring this data, but I’m sure there are a vast number of other uses for it.  One project that I’ve really wanted to take on is to make it so that as employees of Hearsay Social walk into the office, a 5 second clip of their respective theme music starts playing across the office’s Sonos system, much like a baseball player getting up to bat in a stadium. For this case, a training set specifically tuned for people walking into the office would be more ideal, and I’m sure that this would get everyone routinely amped up on a daily basis.

• Rahul Bose
• Scott Benedict Lobdell

Dang, thanks. Not sure if I should be angry or honored.

• modbass

for the life of me i cannot figure out facebook tokens 🙁

• Scott Benedict Lobdell

Where are you stuck?

• modbass

well i am new to facebook API and coding but when i use the easy access link there are check boxes it looks like from reading the documentation i should check User_friends, User_photos, read_Friendslist. but then i put the HTTPS line in to chrome to test it and i always get “Invalid OAuth access token”

• Scott Benedict Lobdell

• modbass

yes

• modbass

i got it to work and i got your script to run but with opencv “import cv” is no longer a thing… and it makes the folder and returns no photos

• Scott Benedict Lobdell

import cv should still work if you’ve installed OpenCV properly. That can be a pain though depending on your operating system. I have a linux tutorial here: http://scottlobdell.me/2014/10/install-opencv-heroku-ffmpeg-support/

You can google around for Mac OS X or Windows

• RBedolair

Late to the party here – I’ve just tried using your method with a couple of tweaks, and I’m only getting three friends listed by get_friend_service_ids().. Looking at the graph API explorer changelogs, it seems that not all friends are returned. Is there something I’m missing? thanks!

• Scott Benedict Lobdell

I wrote this when the v1 API was available so it might have changed. I’m not sure offhand, I’ll try and take a look soon

• Steve Peace

I’m late to the party as well. I’m new to Python and to actually date myself haven’t done much programming since PASCAL! Started programming again and decided to get into Python. I’m working on a project similar to your blog so I thought I would give your scripts a try. I can’t seem to get them to run. I can generate the token ok, but get the following errors:

Traceback (most recent call last):
get_tagged_photos(service_id, name) # TODO: apply_asyncchronously
File “create_pics_from_facebook.py”, line 207, in get_tagged_photos
get_face_in_photo(photo_url, service_id, picture_count, name, x, y) # TODO: apply asynchronously
File “create_pics_from_facebook.py”, line 187, in get_face_in_photo
for face in face_detect_on_photo(photo_in_memory, (x, y)):
File “create_pics_from_facebook.py”, line 157, in face_detect_on_photo
TypeError: OpenCV returned NULL

Any pointers or hints. I’m thinking it is due to your scripts being written for v1 and I can’t generate a token for anything older than v2 of the graph API. If I’m a noob to python, I’m an even bigger noob to Faccebook API’s. I’m reading more about the graphapi, but wanted to make sure that I wasn’t going down the wrong path.

Thanks

• Scott Benedict Lobdell

Hmm, if I were to take a stab just from that stack trace I’d guess that the file path I’m representing in the code sample doesn’t match your hard drive. Is the value for CASCADE valid on your machine?

• Steve Peace

If it is that simple, then you can yell at me for being a HUGE NOOB!!! I should have checked that first. Now you can definitely tell that I haven’t programmed in a long time. Thanks for the the quick reply. I’ll take a look and let you know.

• Scott Benedict Lobdell

Ha. No Noob calling here 🙂

• Steve Peace

You were right. it helps to specify the path the cascade file 🙂

Thanks

• blipton

Nice to see the automation! All of the examples I’ve found for creating a training set, requires to manually save the image with a text file that specifies the path and the coordinates of the face. If a set has thousands of images, how is this possible?

Why isn’t it standard practice to create the image list with coordinates using the face detection (detectMultiScale) first? The image could then be cropped automatically to make the ‘positive’ set of images as small as possible.