Creating an Internet Controlled Raspberry-Pi enabled Camera Gimbal with live stream on a gPhoto2 Enabled Camera and remote functionality integrated with a web server that facilitates the distribution of pictures taken using PubNub

True Story Follows

So every time I’m at the beach, hanging out with my beautiful wife and enjoying the sun, I can’t help but think how great it would be if an autonomous blimp was flying around and had an internet controlled camera on it in which a user could remotely control a resource locked camera, take aerial pictures, and have those pictures sent back to the end user without violating anyone’s privacy.

So I went ahead and wrote a program that does that. Check out the sample videos below:


Controlling Servos (and hence a camera gimbal) Over the Internet

Take Pictures Remotely Over the Internet

This ended up being a reasonably simple project albeit complex in its distribution.

The Components

  • Raspberry Pi

    • Python Camera Driver for gPhoto2 Cameras
    • Android App for a lighter weight camera
    • Web Server for asynchronously handling input images
    • Celery (with Redis) for asynchronous image resizing, watermarking, and delivery
    • ADAFruit Servo Hat for servo control
    • PubNub client for point to point communications with end user and live video feed
    • Boto for interfacing with Amazon S3 to deliver images
  • Web Server

    • Twilio Integration so that users can text to a phone number that replies with a link to map users to phones so images can be sent back vis SMS
    • Redis for resource locking so that only one user controls a camera at a time
    • Django for simple web server stuff
    • Parse for Facebook integration so that the raspberry pi can post images to Facebook as they’re taken.
    • Heroku for web hosting
    • Stripe so that we could sell aerial photos if we wanted
    • HTML5 Joystick so that users can control the camera gimbal in a nifty way

You can call me stupid. But don’t say I’m not crazy enough. I AM crazy enough.

So this all gets tied together like so: A hypothetical blimp flies around with a number that you’re supposed to text. You text the number, and a reply is sent with a unique link that has an access token associated with it. The user clicks on the link, and now that access token is mapped to that user’s phone number (remember this toward the end). The alleged blimp is obviously super popular, so multiple people are trying to access the camera. The user has a spot in line, and the user’s client machine long polls the server to maintain a spot in line. If any other user gives up and leaves, they are evicted from the line. The line, or “queue” if you will (and I will), manages the time slice for when the user will be able to start controlling the camera. Once it’s the uer’s turn, they have an arbitrary time, say 30 seconds, to control the camera.

Once the user has control, An HTML5 joystick allows the user to pan or tilt the camera with variable intensities. The joystick triggers x and y coordinate changes, and those changes are published via pubnub to a unique channel that corresponds to this particular camera gimbal (there will be more than one camera gimbal, obviously). A separate process on a raspberry pi listens on that pubnub channel. Input messages are interpretted and translated into PWM commands that are sent to 2 servos on a camera gimbal that control pan and tilt.

All the while, another process on the raspberry pi is reading JPEG frames from a camera, resizing those frames to digestable sizes (like 150px width), and publishing those to PubNub. On the client side, javascript code is rendering the jpeg stream to the browser. I’m using gPhoto2 code in C++ to interface with the camera, and I’m using Boost to make the camera driver accessiblein Python.

Alternatively, since the camera is heavy and we want the whole thing to be very light, I also wrote an Android app that does the same thing as the python process for streaming images and listening for commands to take pictures. Since the process is only publishing to PubNub, it doesn’t necessarily need to know about anything else going on the Pi.

Now, the user can also take a picture. A command is sent again via PubNub, and either the Python process controlling the camera or the Android application will take a photo depending on settings. In order to decouple this whole process from the rest and make the process/Android app interchangeable, both of those 2 modules will POST base64 encoded JPEG bytes to a web server on the Raspberry Pi.

The Raspberry Pi accepts the images, saves them to the hard drive, then kicks off a celery task. The celery task will resize the image (we might be working with crazy 18 megapixel images or something) to something high quality but still not insanely large. The image is resized again to a “preview image” with a watermark on it. Both the high quality image and the preview image are uploaded to Amazon S3 and made public.

The raspberry pi then notifies the end user that the image is available. I actually implemented this so that the entire application could use one of email, pubnub, or text message for delivery. In all cases, the user receives the link to a URL on the web server that references the preview image. The user can now go and view the preview image, and if Stripe is enabled, then we can sell the higher resolution quality images without the watermark. If the user purchases the photo, we have to manage mapping the preview image to the high quality image. To avoid using a database, I made the preview image file name a UUID. Then I applied another salt UUID to the image name and exclusive OR’ed all the bits together. In this way, all preview images map to a high quality image, but I digress. If the stripe charge succeeds, a link to the high quality image is E-Mailed to the user, and the user’s browser is redirected to that image.



The End