The Fastest Django App I’ve Ever Written

I’ve been a professional software engineer now for about a year and 3 months. During that time I’ve learned a ton, and I’ve been able to really see and measure that just recently when I took on a small side project similar in scope to the sorts of things I was working on while I was in the Army. Since the nature of the project was similar and I was deploying a Django app from start to finish, I could compare how fast I was able to write and deploy the code, how clean the code was, how effective it was in general, and the difference in the visuals for the front end.

To compare myself before and after is like comparing Paul Dillett during and after steroid use, except in reverse.

dillett_then
Paul Dillett during steroid use.


Paul Dillett after steroid use.

 

BUT IN REVERSE!

 

 


Before
dillett_then
After

I intended to write about the brief project both to share the application itself but also to show some code samples that demonstrate some of the things that I’ve learned at Hearsay Social. Since we’re writing proprietary code, I can’t necessarily show those samples unless I try and think of some arbitrary example with no real use case, which typically results in these weird models that no one cares about and would ever use IRL (cool people term for “in real life”).

The work was done on the train ride home over the course of two weeks. I took up the work because of what I’d intended to do over a year ago with aerial photography, but I had given up on the idea basically because the amount of work required to see the entire project through would be more than I think I could take on. However, after working with Larry Fleming from a company that builds and sells remote control blimps, I found the opportunity to specialize and build an application without having to get directly involved with drones.

The Project Itself

Currently, the site is in limbo as far as a corresponding domain name, but I have the application temporary hosted at PictureBlimp.com; I’ll come back to update this post as necessary. In a very generic sense, the idea is that any photographer can be associated with the Django application, and once associated, they can drop pictures onto a folder into Dropbox, and their work is done. From there, my application interfaces with the Dropbox API to grab new pictures from Dropbox, copy the picture to both a watermarked and thumbnailed version, then upload the thumbnail, watermarked file, and original to Amazon S3. The watermark and thumbnail file are public. The original is not.
Screen Shot 2014-08-06 at 8.49.02 PM Screen Shot 2014-08-06 at 8.49.12 PM

From there, the image is saved and associated with the photographer that uploaded it via Dropbox, and the date taken is extracted via the jpeg’s exif data. The picture can be associated with an event based on the folder name it was in within Dropbox. End users can then peruse pictures on a web application and search either by a calendar view or by an event’s name.
Screen Shot 2014-08-06 at 8.48.22 PM

Screen Shot 2014-08-06 at 8.51.53 PM

Pricing is also controlled via dropbox to make maintenance as easy as possible. Rather than just sell original images directly, pricing is determined by resolution with a simple text file that can be modified in Dropbox. Prices are associated with image resolutions.

The site is intentionally lightweight, so shopping cart logic is controlled entirely through the session. Upon checkout, credit card information is handled entirely by Stripe, which offloads the boring logic of payments and alleviates any responsibility of credit card security. Upon a successful credit card charge, the application will download the original image from Amazon, resize it to the size specified by the customer, re-upload the file to Amazon with an encoded file name, and email the customer with a link to the file.

Django does have built-in functionality to email users, but this incurs an extra layer of maintenance when free services exist in API form to email people. I used MailGun, which allows 10,000 emails free per month. After that, it’s still dirt cheap, but I don’t intend to Email that many people to begin with. Without that service, we would need to setup some type of Email Server and manage a separate process, risking errors, memory, and general domain setup with Email.

The Code

As mentioned earlier, there’s a huge difference between my code now and my code one year ago to the point that it’s embarassing to show my old code samples. The biggest difference is my current obsession with modularity and constantly decomposing problems. Now, I constantly keep The Law of Demeter in mind. The thought process I maintain is:

  • Can this block of code be abstracted into a function with a name describing what I’m doing?
  • Am I repeating something that I’ve already done that can be abstracted into a function?
  • Is this class taking on more responsibility than it should?
  • Does this class know too much about the objects that it’s manipulating?
  • Is there an excessive number of parameters being passed?
  • Is this block of code getting to be unnecessarily long?
  • Does this code have corresponding test coverage / is this function easy to test?

Other thoughts going through my head:

  • How is Le doing right now with the kids?
  • I wonder what’s going on at lunch today?
  • Would it be too wild and crazy if I drank more coffee?
  • Is there anything ironic or ridiculous about my current surrounding that I can point out to everyone else around me?

Another common pattern I’ve adopted is to create proxy objects for Django models. This was something that Adam “The Blonde Bomber” Depue introduced to me and I initially questioned. My thought process at the time was that creating the proxy objects defeated the purpose of using Django, and building these objects intrinsically indicated that the programmer writing the code was conveying the message that he/she did not trust their co-workers to properly use the associated Django models. Here is some sample code that demonstrates the pattern I’m talking about:

import datetime

from django.core.urlresolvers import reverse
from django.db import models

class _Picture(models.Model):

    class Meta:
        app_label = 'pictures'
        db_table = 'pictures_picture'

    event_id = models.IntegerField()
    photographer_name = models.CharField(max_length=100, null=True)
    date_taken = models.DateTimeField(null=True)
    saved_to_hard_drive = models.BooleanField(default=False)
    uploaded_to_amazon = models.BooleanField(default=False)
    amazon_bucket = models.CharField(max_length=255)
    watermark_suffix = models.CharField(max_length=100)
    thumbnail_suffix = models.CharField(max_length=100)
    event_name_at_save_time = models.CharField(max_length=255)
    # TODO need to index by date, ID, and event_id

class Picture(object):

    BASE_URL = "https://s3.amazonaws.com"

    def __init__(self, _picture):
        self._picture = _picture

    @classmethod
    def _wrap(cls, _picture):
        return Picture(_picture)

    @classmethod
    def create_for_event(cls,
                         event_obj,
                         date_taken,
                         photographer_name,
                         amazon_bucket,
                         watermark_suffix,
                         thumbnail_suffix):

        _picture = _Picture.objects.create(event_id=event_obj.id,
                                           photographer_name=photographer_name,
                                           date_taken=date_taken,
                                           amazon_bucket=amazon_bucket,
                                           watermark_suffix=watermark_suffix,
                                           thumbnail_suffix=thumbnail_suffix,
                                           event_name_at_save_time=event_obj.name)
        return Picture._wrap(_picture)

    def mark_saved_on_hard_drive(self):
        self._picture.saved_to_hard_drive = True
        self._picture.save()

    def mark_uploaded_to_amazon(self):
        self._picture.uploaded_to_amazon = True
        self._picture.save()

    @classmethod
    def get_pictures_from_filenames(cls, file_paths):
        filenames = [file_path.split("/")[-1] for file_path in file_paths]
        ids = [int(filename.split(".")[0]) for filename in filenames]
        _pictures = _Picture.objects.filter(id__in=ids)
        return [Picture._wrap(_picture) for _picture in _pictures]

    @classmethod
    def get_pictures_in_month_day_year(cls, month, day, year):
        start = datetime.datetime(year=year, month=month, day=day, hour=0, minute=0, second=0, microsecond=0)
        end = start + datetime.timedelta(days=1)
        _pictures = (_Picture.objects.
                     filter(date_taken__gte=start).
                     filter(date_taken__lt=end).
                     filter(uploaded_to_amazon=True))
        return [Picture._wrap(_picture) for _picture in _pictures]

    @classmethod
    def get_pictures_in_month_and_year(cls, month, year):
        ''' 1 is January, 12 is December '''
        start = datetime.datetime(year=year, month=month, day=1, hour=0, minute=0, second=0, microsecond=0)
        end = (start + datetime.timedelta(days=31)).replace(day=1)
        _pictures = (_Picture.objects.
                     filter(date_taken__gte=start).
                     filter(date_taken__lt=end).
                     filter(uploaded_to_amazon=True))
        return [Picture._wrap(_picture) for _picture in _pictures]

    @classmethod
    def get_by_id(cls, picture_id):
        _picture = _Picture.objects.get(id=picture_id)
        return Picture._wrap(_picture)

    @classmethod
    def get_by_ids(cls, picture_ids):
        _pictures = _Picture.objects.filter(id__in=picture_ids)
        return [Picture._wrap(_picture) for _picture in _pictures]

    @classmethod
    def get_pictures_from_event(cls, event_obj):
        _pictures = _Picture.objects.filter(event_id=event_obj.id).filter(uploaded_to_amazon=True)
        return [Picture._wrap(_picture) for _picture in _pictures]

    @classmethod
    def get_pictures_by_most_recent(cls, max_count=None):
        _pictures = (_Picture.objects.
                     exclude(date_taken__isnull=True).
                     filter(uploaded_to_amazon=True).
                     order_by("-date_taken"))
        if max_count:
            _pictures = _pictures[:max_count]
        return [Picture._wrap(_picture) for _picture in _pictures]

    @classmethod
    def get_most_recent_datetime(cls):
        return _Picture.objects.all().latest('date_taken').date_taken

    @property
    def amazon_key(self):
        return "%s/%s.jpg" % (self._picture.event_name_at_save_time,
                              self.id)

    @property
    def thumbnail_url(self):
        return "%s/%s/%s/%s%s.jpg" % (self.BASE_URL,
                                      self._picture.amazon_bucket,
                                      self._picture.event_name_at_save_time,
                                      self.id,
                                      self._picture.thumbnail_suffix)

    @property
    def watermark_url(self):
        return "%s/%s/%s/%s%s.jpg" % (self.BASE_URL,
                                      self._picture.amazon_bucket,
                                      self._picture.event_name_at_save_time,
                                      self.id,
                                      self._picture.watermark_suffix)

    @property
    def filename(self):
        return "%s/%s.jpg" % (self._picture.event_name_at_save_time, self._picture.id)

    @property
    def id(self):
        return self._picture.id

    @property
    def event_name(self):
        return self._picture.event_name_at_save_time

    @property
    def event_id(self):
        return self._picture.event_id

    @property
    def url(self):
        return reverse('picture', args=[self.id])

    @property
    def date_taken(self):
        return self._picture.date_taken

Note that in the above code, the Picture object that we end up using for the rest of the application is a simple Python object. Database queries are moved to classmethods on the class, thus creating a single chokepoint for where queries occur. Properties of the class are read-only, and actions to save or update the model are placed inside explicit methods.

Of course, now I have a different outlook, and even with the questions I raised earlier, distrusting co-workers is healthy to the degree that you can build modules such that it is difficult for them to fail for the very same reason that all production code should have corresponding test coverage. While django is incredibly easy to use, its ease of use can encourage bad or sloppy coding habits. Some of the benefits of building models in this way:

  • It makes it easier to de-couple models (i.e. put django models in different databases entirely if you wanted)
  • Major use cases are isolated to one file, making it easy to modify or isolate as necessary
  • It’s easy to see your primary use cases and generate SQL indexes as necessary
  • Proper usage and querying can be baked into the code
  • It becomes incredibly easy to understand and learn what the model encompasses

To my last point, consider a case in which you as a developer are stepping into someone else’s code and learning about a django model for the first time. Below are some screenshots with me tabbing out the attributes of a django object and its corresponding proxy object:

 

 

Django Model

 
Screen Shot 2014-08-06 at 5.53.20 PM

Proxy Model

 
Screen Shot 2014-08-06 at 5.54.09 PM

In the example of the django object, there are tons of generic attributes and methods that I will never use, and I have no idea what they do. The proxy object, by contrast, has clear attributes and methods that I know are safe to use. Within the django object, it’s difficult to discern which attributes/methods are django specific and which ones are items that the authoring developer added.

Also note that there are no foreign keys in my django model. This forces the end operating developer to put his or her thinking cap on and helps to avoid lazy database queries. Foreign keys can create a minor performance cost, and avoiding using them also helps for the servicification aspect I mentioned above. Also, this forces the maintenance logic of the database to be done by python, alleviating a small load from the database. This is beneficial where possible since the database is a shared resource, but the instances of the python code are not necessarily shared between requests. On that note, we can’t as easily join tables together at the SQL level, and instead we generally do the same logic with two queries with python. This, again, shifts the responsibility of building a temp table to python which is fairly easy using dictionary logic. In the process we also use less memory when joining a table; SQL will populate every column with a row on table joins when it’s not really necessary as we duplicate objects on those joins.

Anyway, another end result is that the logic for the controller is super simple and straightforward, leaving your views.py uncluttered, and use cases are outlined from the top down all the way through:

def picture(request, picture_id):
    render_data = {}
    try:
        render_data['picture'] = Picture.get_by_id(picture_id)
    except ObjectDoesNotExist:
        raise Http404
    render_data['pricings'] = Pricing.get_for_event_id(render_data['picture'].event_id)
    return global_render_to_response(request, "basic_navigation/picture.html", render_data)

Boto

Of the libraries I ended up using, it’s pretty bananas what power is readily available. Boto is a module that allows a user to readily interface with Amazon S3, which offers crazy cheap storage and is the same hosting tool used by companies like Dropbox. 1 gigabyte costs you $0.03 per month. Conceptually, a bucket can be created in S3 for a developer, and a key represents a filepath from there. Items can be uploaded and downloaded using the key, and files remain private until explicitly made public. As such there are all sorts of applications that could be created even for basic home usage.

Stripe

For my purposes, I tried using the Paypal API and gave up shortly thereafter. The API didn’t make much sense, the examples weren’t that great and they were all in PHP, and it was an overly complex process to create a charge. Plus there are additional burdens that get created for the end user. I’d heard of Stripe before, so I gave that a try and was literally up and running within minutes. Stripe is purely comissioned based, so it’s ideal for a side project like mine as well where I have no idea if I’ll even make money and I don’t want to be paying recurring monthly charges (like Paypal requires).

Stripe makes it so that you don’t need to touch credit card information whatsoever. You can use a client-side javascript file that posts information to Stripe, where they handle all of the security requirements of credit card usage, and they pass back a token to you which represents a credit card payment that you can then charge. Their API is extremely straightforward and simple.

Dropbox API

The Dropbox API is exactly as you’d expect. There are high level operations to read directory listings from a Dropbox account, read a file, write to a file, move a file, etc.

Mailgun

Mailgun is one of many services that create an API interface for sending and receiving emails in bulk for very cheap. They were the #1 result on google, hence why I signed up for their service. A simple POST call sends an email, which was sufficient for my purposes.