Death to the Django ORM

So first off, to clarify, the title of this post should be “Isolation of the Django ORM,” but, you know, I just wanted to stir up some drama with a controversial title that would really stir some violent debate. I recently wrote a blog post that actually contained too much disparate content to really create a clear theme, and in the process it made me want to follow up with a subsequent post to clarify the main point I ended up driving home. You know…isolate a single topic so that it knows nothing about the rest of the stuff you’re writing about…like in software. Get it? Because that’s also what I’m writing about. It’s a pun.

So John “Arvin” Lynn read my post, didn’t even hit the like button on Facebook, and then took the effort to follow up with me with an insightful video that I might find interesting.  The video is below:

And in fact, I did find it interesting.  The very points I was trying to make in my previous blog post about creating proxy models for Django were hit home and explained far more intelligently by someone far nerdier than I (and mind you, I mean that as a compliment, and I don’t think Alex Gaynor would take this as an insult.  You’ll notice that his shirt said “2.8” on it.  That’s as nerdy as it gets).  If you look at the comments of the YouTube video, which often leads down to the cesspool of the internets, there was a comment with the complaint that Alex had voiced his qualms with Django quite articulately and with examples, but to follow that, he did not provide the solution to his complaints.  I like to think that my single example from my previous post illustrated an example solution of an abstract concept he was talking about.

 

So what was said in the video?

In my own interpretation of Alex’s points:

  • MVC is an oversimplification and does not account for the additional modularity to write clean, maintainable code
  • Django’s environment has a clear definition of what MVC means, and perhaps that definition falls wildly short of what it really should be
  • Django’s ORM is used all over the place, at least in Django code samples even from the book, that are totally not MVC’ish

Here’s the single code sample that Alex used to make his point:

def vote(request, poll_id):
    p = get_object_or_404(Poll, pk=poll_id)
    try:
        selected_choice = p.choice_set.get(pk=request.POST['choice'])
    except (KeyError, Choice.DoesNotExist):
        # Redisplay the poll voting form.
        return render_to_response('polls/detail.html', {
            'poll': p,
            'error_message': "You didn't select a choice.",
        }, context_instance=RequestContext(request))
    else:
        selected_choice.votes += 1
        selected_choice.save()
        # Always return an HttpResponseRedirect after successfully dealing
        # with POST data. This prevents data from being posted twice if a
        # user hits the Back button.
        return HttpResponseRedirect(reverse('polls.views.results', args=(p.id,)))

Here, there’s no real separation from the view and the model. The get_object_or_404 function is a helper function that intermingles the two. Database queries are being made directly from request data. Here are my interpretations of the points Alex made in order to avoid this sort of intermingling:

  • Requests should not need to be re-created in order to test your code
  • Team members should not need to understand or know about your database schema in order to use your models.
  • Django models shouldn’t need to know about other django models

So let’s see if I can offer a “better” way to re-create the above logic using Alex’s points and hopefully I can do his points justice:

class _Poll(models.Model):
    ''' django model stuff goes in here '''

class Poll(object):
    ''' pure python object that abstracts everything in _Poll '''
    def __init__(self, _poll):
        self._poll = _poll

    def get_choices(self, choice_id):
        ''' Lots of other logic goes here involving a m2m table;
            involving similarly built out proxy models'''

    @classmethod
    def get_by_id(self, poll_id):
        _poll = _Poll.objects.get(id=poll_id)
        return Poll(_poll)

def up_vote_poll_id_with_choice_id(poll_id, choice_id):
    try:
        poll = Poll.get_by_id(poll_id)
    except ObjectDoesNotExist:
        return False
    selected_choice = poll.get_choices(choice_id)
    selected_choice.upvote()
    return True

def vote(request, poll_id):
    if 'choice' not in request.POST:
        render_data = {
            'error_message': "You didn't select a choice.  Idiot."
        }
        return render_to_response('polls/detail.html', render_data, status=400)
    choice_id = int(request.POST['choice'])
    success = up_vote_poll_id_with_choice_id(poll_id, choice_id)
    if success:
        return HttpResponseRedirect(reverse('polls.views.results', args=(poll_id,)))

    render_data = {
        'error_message': "That poll ID doesn't exist.  Idiot"
    }
    return render_to_response('polls/detail.html', render_data, status=400)

Here, we try and separate request/response logic, business logic, and database interaction. Methods can be individually tested, validated, and re-used. Other developers don’t need to know what data is inside the database or how the table is constructed or which columns are indexed.  Modules know only as much as they need to know about neighboring modules.

The second approach is noticeably longer, but the cost of more verbose code is marginal. Writing code is far easier than figuring out a problem. The majority of code development involves maintenance, not the construction of new blocks of code. As such, the drawback of more lines becomes even more negligible. Making a change is simple, easy, and testable.

To the point that modules only know as much as they need to know about neighboring modules, this brings up The Law of Demeter. In super plain English, following the Law of Demeter allows maintenance and long term development of code much easier and cheaper. To actually abide by the Law, the basic idea is to eliminate excessive dots. For example, this is bad:

def signup(account):
    name = account.owner.name

But this is better:

class Account(object):
    @property
    def owner_name(self):
        return self.owner.name

def signup(account):
    name = account.owner_name

Even though these are logically equivalent, the point is that in the first case, the module we’re in has to know something about the Account object and the Owner object. In the second case, the module we’re in only has to know something about the Account object. Over time and with more complex code, it’s much simpler to modify things when modules only know about their immediately related modules. This is loose coupling. Coupling must exist in order for your code to even do anything, but we want to minimize it wherever possible.

Consider this awesome illustration I drew with my own two hands with my own blood, sweat, and tears where we represent objects with little squares, and relationships between objects or their knowledge of each other is represented with an edge.IMG_0372

IMG_0373:
Of the two, the first illustration is clearly an easier block of code to maintain given each objects’ limited knowledge of its neighbors. If we wanted to remove a module or replace a module with something else, it would be incredibly easy to do. In the second example, everything just looks like a giant jenga puzzle, and no one wants to touch it or modify it, but then stuff still needs to get added to it, so more and more code just gets mangled in there, and the code just continues to rot.

Such is the reasoning for using django proxy models. Just to finish this post in the laziest manner possible, here is some more example code from my latest side project in which I used proxy models in conjunction with Django models:

import datetime

from django.db import models

from ..pricing.models import Pricing
from ..pictures.models import Picture


class _Order__Picture(models.Model):

    class Meta:
        app_label = 'orders'
        db_table = 'orders_order__picture'

    order_id = models.IntegerField()
    picture_id = models.IntegerField()
    price = models.FloatField()
    dimensions = models.CharField(max_length=50)
    final_picture_url = models.CharField(max_length=255)


class _Order(models.Model):

    class Meta:
        app_label = 'orders'
        db_table = 'orders_order'

    date_created = models.DateTimeField()
    customer_email = models.CharField(max_length=100)
    completed_credit_card_payment = models.BooleanField(default=False)
    completed_photo_processing = models.BooleanField(default=False)
    error_message = models.CharField(max_length=255)


class Order(object):

    def __init__(self, _order):
        self._order = _order
        self._cached_pictures = None

    @classmethod
    def _wrap(cls, _order):
        return Order(_order)

    @classmethod
    def create(cls,
               customer_email,
               picture_id_to_pricing_id):
        pricing_id_to_obj = {p.id: p for p in Pricing.get_by_ids([int(i) for i in picture_id_to_pricing_id.values()])}
        _order = _Order.objects.create(date_created=datetime.datetime.utcnow(),
                                       customer_email=customer_email,
                                       error_message="")
        for picture_id, pricing_id in picture_id_to_pricing_id.items():
            pricing = pricing_id_to_obj[pricing_id]
            _Order__Picture.objects.create(order_id=_order.id,
                                           picture_id=picture_id,
                                           price=pricing.price,
                                           dimensions=pricing.dimensions,
                                           final_picture_url="")
        return cls._wrap(_order)

    @classmethod
    def get_by_id(cls, order_id):
        _order = _Order.objects.get(id=order_id)
        return cls._wrap(_order)

    def get_pictures(self):
        if self._cached_pictures is not None:
            return self._cached_pictures

        picture_ids = _Order__Picture.objects.filter(order_id=self.id).values_list('picture_id', flat=True)
        self._cached_pictures = Picture.get_by_ids(picture_ids)
        return self._cached_pictures

    def get_pictures_with_dimensions(self):
        m2ms = _Order__Picture.objects.filter(order_id=self.id)
        picture_ids = [m2m.picture_id for m2m in m2ms]
        self._cached_pictures = Picture.get_by_ids(picture_ids)
        picture_id_to_obj = {p.id: p for p in self._cached_pictures}
        return [(picture_id_to_obj[m2m.picture_id], m2m.dimensions) for m2m in m2ms]

    @property
    def id(self):
        return self._order.id

    @property
    def total_price(self):
        all_prices = _Order__Picture.objects.filter(order_id=self.id).values_list('price', flat=True)
        total_price = sum(all_prices)
        return "%.2f" % total_price

    @property
    def num_pictures(self):
        return len(self.get_pictures())

    def mark_credit_card_payment_complete(self):
        self._order.completed_credit_card_payment = True
        self._order.save()

    def mark_photo_processing_complete(self):
        self._order.completed_photo_processing = True
        self._order.save()

    def annotate_error_message(self, error_message):
        error_message = error_message[:255]
        self._order.error_message = error_message
        self._order.save()

    def update_picture_id_with_finish_url(self, picture_id, final_url):
        # alternatively I could just update the properties as appropriate
        # instead of busting the cache
        self._cached_pictures = None
        m2m = _Order__Picture.objects.get(order_id=self.id,
                                          picture_id=picture_id)
        m2m.final_picture_url = final_url
        m2m.save()

    def get_final_image_urls(self):
        m2ms = _Order__Picture.objects.filter(order_id=self.id)
        return [m2m.final_picture_url for m2m in m2ms]

import random

from django.db import models


class _Coupon(models.Model):

    class Meta:
        app_label = 'coupons'
        db_table = 'coupons_coupon'

    code = models.CharField(max_length=50)
    expiration_date = models.DateTimeField(null=True)
    dollar_value = models.FloatField()
    percent_off = models.FloatField()  # value from 0..1
    reason_created = models.CharField(max_length=255)
    redeemed = models.BooleanField(default=False)


class Coupon(object):

    def __init__(self, _coupon):
        self._coupon = _coupon

    @classmethod
    def _wrap(cls, _coupon):
        return Coupon(_coupon)

    @classmethod
    def _generate_random_string(cls):
        # this string supports 10 million permutations
        random_chars = "".join([chr(random.randint(0, 25) + 65) for count in xrange(5)])
        while _Coupon.objects.filter(code=random_chars).exists():
            random_chars = "".join([chr(random.randint(0, 25) + 65) for count in xrange(5)])
        return random_chars

    @classmethod
    def create_dollar_value_with_reason(cls, dollar_value, reason_str):
        coupon_str = cls._generate_random_string()
        _coupon = _Coupon.objects.create(code=coupon_str,
                                         dollar_value=dollar_value,
                                         reason_created=reason_str,
                                         percent_off=0.0)
        return cls._wrap(_coupon)

    @classmethod
    def create_percent_off_with_reason(cls, percent_off, reason_str):
        if percent_off < 0.0 or percent_off > 1.0:
            raise ValueError("Invalid value for percent_off")
        coupon_str = cls._generate_random_string()
        _coupon = _Coupon.objects.create(code=coupon_str,
                                         dollar_value=0.0,
                                         reason_created=reason_str,
                                         percent_off=percent_off)
        return cls._wrap(_coupon)

    def is_valid(self):
        if self._coupon.dollar_value > 0:
            return True
        if self._coupon.percent_off > 0 and not self._coupon.redeemed:
            return True
        return False

    def get_amount_off(self, order_dollar_total):
        if not self.is_valid():
            return 0.0
        amount_off = self._coupon.dollar_value
        amount_off += (order_dollar_total - amount_off) * self._coupon.percent_off
        return amount_off

    def mark_redeemed(self):
        self._coupon.redeemed = True
        self._coupon.save()

import datetime
import random

from django.core.exceptions import ObjectDoesNotExist
from django.core.urlresolvers import reverse
from django.db import models

from ..pictures.models import Picture


class _Event(models.Model):

    class Meta:
        app_label = "events"
        db_table = "events_event"

    name = models.CharField(max_length=255)
    date_hosted = models.DateTimeField()
    # TODO make sure that I index on name, should be unique...


class Event(object):

    def __init__(self, _event):
        self._event = _event
        self._cached_pictures = None
        self._cached_cover_picture = None

    @classmethod
    def _wrap(cls, _event):
        return Event(_event)

    @classmethod
    def get_by_id(cls, event_id):
        _event = _Event.objects.get(id=event_id)
        return cls._wrap(_event)

    @classmethod
    def get_or_create_from_event_name(cls, event_name):
        cleaned_event_name = " ".join(event_name.split())
        cleaned_event_name = cleaned_event_name.title()
        try:
            _event = _Event.objects.get(name=cleaned_event_name)
        except ObjectDoesNotExist:
            _event = _Event.objects.create(name=cleaned_event_name,
                                           date_hosted=datetime.datetime.utcnow())
        return cls._wrap(_event)

    @classmethod
    def get_events_by_most_recent(cls, max_count=None):
        # this assumes that a relatively small number of events will take place
        # total...no paging built in or anything
        _events = _Event.objects.all().order_by("-date_hosted")
        if max_count is not None:
            _events = _events[:max_count]
        return [cls._wrap(_event) for _event in _events]

    def get_pictures(self):
        if self._cached_pictures is not None:
            return self._cached_pictures
        self._cached_pictures = Picture.get_pictures_from_event(self)
        return self._cached_pictures

    @property
    def _cover_picture(self):
        if self._cached_cover_picture is not None:
            return self._cached_cover_picture
        try:
            self._cached_cover_picture = random.choice(self.get_pictures())
        except IndexError:
            self._cached_cover_picture = None
        return self._cached_cover_picture

    @property
    def cover_picture_thumbnail(self):
        if self._cover_picture:
            return self._cover_picture.thumbnail_url
        return None

    @property
    def cover_picture_watermark(self):
        if self._cover_picture:
            return self._cover_picture.watermark_url
        return None

    @property
    def id(self):
        return self._event.id

    @property
    def name(self):
        return self._event.name

    @property
    def formatted_date(self):
        datetime_obj = self._event.date_hosted
        return datetime_obj.strftime("%B %d, %Y")

    @property
    def url(self):
        return reverse('event', args=[self.id])

The End.