Automated Intro Music in the Office: SoCos Sonos Library, Python, OpenCV, gPhoto2, and Boost

I gave a much more in-depth talk on this topic at the San Francisco Django Meetup. Slides can be found here.

It all started a few months back. As our company was making a more concerted effort to automate the deployment process of our code base into production, at least three of us decided that it was also important to automatically play warhorns across the office’s Sonos System upon every single deploy. Thankfully, Dale “D-Hustle” Hui stumbled upon the SoCo Sonos Library.

Not long after, I started exploring facial recognition in Python. At that point, the only thing that made sense was to set up a Facial Recognition system in the office. All of the code for the project can be found on the internets here.

General Overview

The idea behind all of the above components is that as people walk into the office, their face is identified and their respective introduction and 10 second intro song is selected. The mp3’s are queued onto the office’s Sonos system, and the audio will play across the office. After the audio has played, the Sonos system will resume to its original state (current song playing, initial volume, etc).



As of writing this (I intend to fine tune and update this post), I’m still optimizing the training set and might apply some additional thresholds to minimize false positives. Initially, the camera I had used was an IP camera (code can be found here), but not only did the frame rate minimize the number of faces I could compare and capture, but since no one actually deliberately stopped in front of the camera and stood there for a good sample to be taken, I did not get good results.

Instead, I switched to a Canon SLR that I own and set it up my desk and used a telephoto lens.  I fixed the camera on the office entrance and manually set the focal point right at the door.  In this manner, we could minimize the sampling of faces to a single plane.  See the differences below:

Screen Shot 2014-02-05 at 10.40.42 PM

IP Camera

Screen Shot 2014-02-05 at 10.41.29 PM

Canon SLR

Keep in mind that the pictures I’m using here are not the best example, but for obvious privacy concerns I’m only displaying my face.  Moreover, the data for my face is a better than average sample since I was taking the time to make repeated tests.

photo (1)

Now, by using a telephone lens fixed on a location, the collection of data is much more passive, and by using a camera connected over USB I get much a better framerate.  This means more data and more chances to compare faces.