Writing Your Own Logging Engine for a Python Web Application

Too Long; Did Not Read

If you want to save a whole bunch of data for logs but don’t want to spend any money, you can dump large amounts of strings into DynamoDB (without likely exceeding Amazon’s free tier).

The Problem

I outlined the context of my problem in a previous post (which also has sample code for basic DynamoDB usage with Boto), but basically, in order to troubleshoot or fine tune behavior that was happening in production for a side project, I needed to log what was happening in an understandable (plain English) way.

At work, we use Splunk which is an awesome service, but I ended up not using it. I explain in greater detail below, but at the end of the day, it all boiled down to the fact that it would cost several thousand dollars per month.

Cost Analysis for Paid Logging Services

After some quick googling coupled with an elementary knowledge of existing services, I found a few solutions to my problem at hand. In order to understand the costs of all these services, I wanted to convey the costs in terms of something that everyone can understand: Chipotle Burritos.

To keep things simple, we’re going to assume that this is a steak burrito. Now if we go to Chipotle, there’s going to be about ten people making a single burrito in their production line. When you order double meat from the get go, it wouldn’t matter if Paul Revere was working that day, the double meat message isn’t going to make it to the register. So this is going to be a double meat burrito, but it’ll still be the same price as a regular burrito. Locally, that burrito is $7.40.

At work, we use Splunk. I’m not sure if their free tier has gone away, but the entry level service costs 243 Chipotle Steak Burritos per month, or about 7.8 chipotle burritos per day. Which, if we factor in a reasonable amount of sleep, that’s 1 burrito every 2 hours for an indefinite amount of time.

If we continue down the line of services, I found Loggly, which offers a basic service for only 6.75 Chipotle Burritos per month, which is pretty good, but this is still a decent bit higher than my general allotment of Chipotle Burritos.

As an alternative, I could write my own logger and save all the data to DynamoDB without incurring any charge. However, after a long day of laborious coding, I will be left famished, and I will go to Chipotle and order a single burrito.

So if we tabulate the results:

Selection
Cost in Terms of Chipotle Burritos per Month
Splunk 243 burritos
Loggly 7 burritos
Write my own logger and save data to DynamoDB 1 burrito (fixed)

The choice appears clear. I will write my own logging engine.

A Really Awesome Anecdote

Hark back to my college days, and I was a cadet at the prestigious United States Military Academy at West Point, where for almost 4 years I stressed daily about my overall class rank in order to fulfill my ultimate goal of becoming a helicopter pilot. I changed my mind about becoming a pilot at the last minute and instead opted for the completely uncompetitive nerd branch of the Army, so in hindsight I really should have prioritized more important things in life: lifting weights with greater volume and intensity in order to make that one button on your uniform at chest level eventually break and pop to garner more respect from fellow peers.

But I digress. Every now and then we’d have to complete the tedious task of what was called a Saturday AM Inspection. In which, you’d need to make your room spotless and look like what you’d expect in dramatic military films. I checked on the internets, and you can see some dude put together a timelapse of cleaning his room:

But he’s doing it wrong. The more efficient way to tackle the problem at hand is to instead take all of your personal belongings that don’t necessarily need to be displayed and the bulk amount of all of your things and throw it into a few garbage bags. Then take the garbage bags down to the basement and lock them up. Now you’re already 90% done.

The approach above in the video of tediously managing every detail is liken to using SQL for the problem at hand and defining a schema and strict types as you go about the problem. Key-value storage, however, allows you to just throw a bunch of crap into a consolidated place, which is the approach I opt for in real life, and it’s how I went about collecting and savings logging information. And when you need to query your information, you still know where it is. It’s in the basement locked in the storage container inside of those garbage bags.

The Code

Generic Logger Class

With the intent to create classes that were responsible for managing exactly one task, I divvied out responsibilities for posting the information to AWS and actually logging the information specific to a certain use case. A sample generic logger is laid out below:

from boto.dynamodb2 import connect_to_region
from boto.dynamodb2.table import Table

class GenericLogger(object):

    AWS_TABLE_NAME = os.environ['AWS_TABLE_NAME']
    REGION = "us-east-1"

    def __init__(self, generic_object, prefix):
        connection = connect_to_region(
            self.REGION,
            aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
            aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY'],
        )
        self.table = Table(
            self.AWS_TABLE_NAME,
            connection=connection
        )
        self.prefix = prefix
        self.generic_object = generic_object
        self.messages = []

    def add_message(self, message):
        self.messages.append(message)

    def commit(self):
        required_hash_data = {
            "generic_object_id_hash": "%s_%s" % (self.prefix, self.generic_object.id),
            "timestamp": datetime_to_timestamp_ms(datetime.datetime.utcnow())
        }
        message_data = {
            "message": "\n".join(self.messages)
        }
        final_dynamo_data = dict(required_hash_data.items() + message_data.items())
        self.table.put_item(data=final_dynamo_data)

Specific Use Case Class

Subsequently, I wrote a class that would basically isolate logging statements and logic to one place. All too often I see code that’s been polluted with additional logic that relates specifically to logging and not to the method’s primary task.

One added bonus that I got pretty stoked about was that after I completed my first iteration of this code, I got tons of logging information, but it was difficult to isolate layers of abstraction. So I went ahead and took the current stack trace before every logging statement and tabbed the line over with white space to correspond to the stack’s depth. This made the logging statements appear in blocks that would mimic code but with statements that were in English.

The class below is an abridged version of the full implementation (you get the idea).

import datetime
import traceback
from collections import defaultdict

from workout_generator.logging.generic_logger import GenericLogger
from workout_generator.constants import WorkoutComponent
from workout_generator.constants import Phase
from workout_generator.constants import MuscleGroup


class WorkoutLogger(object):

    def __init__(self, user):
        self.generic_logger = GenericLogger(user, "workout")
        self.user = user
        self.initial_stack_height = None

    def _log(self, message):
        stack_height = len(traceback.format_stack())
        if self.initial_stack_height is None:
            self.initial_stack_height = stack_height
        offset = stack_height - self.initial_stack_height
        offset_spaces = "".join(["  " for _ in xrange(offset)])
        final_message = "%s%s" % (offset_spaces, message)
        self.generic_logger.add_message(final_message)

    @classmethod
    def for_user(self, user):
        return WorkoutLogger(user)

    def commit(self):
        self.generic_logger.commit()

    def log_start(self):
        self._log("Starting Workout Generation at %s" % datetime.datetime.utcnow())

    def log_recreate_filter(self):
        self._log("Recreating exercise filter")

    def log_dead_end_for_component(self, workout_component_id):
        workout_component = WorkoutComponent.get_by_id(workout_component_id).title
        self._log("No exercises available for current filter for %s" % workout_component)

    def log_no_more_for_component(self, workout_component_id, exception):
        workout_component = WorkoutComponent.get_by_id(workout_component_id).title
        self._log("Not commiting anymore exercises for %s (%s)" % (workout_component, exception))

The code that ends up calling the logging class doesn’t need to get bogged down in details of how data gets formatted but instead can simply offload a well-express method call (“log_available_exercises” seems pretty self-explanatory, and how verbose that logging becomes isn’t a concern)

(Small Tangent) Profile the Code

Interestingly enough, I was about to write about how slow the added logging had made the application. The tests ran four times slower (I mocked out the actual commit to Amazon), so I assumed this had to do with Python’s notoriously slow string formatting. I was intending to profile the code to show how slow the logger ended up being.

But when I profiled the code, my theory was proven incorrect. The creation of strings was no problem, it was just that an SSL connection was being established with AWS because of my poor decision to establish a connection inside of a constructor (the connection should have been lazily evaluated then cached). And below is output of the code profiling:

python -m cProfile -s time create_workout_test.py

         551447 function calls (545997 primitive calls) in 1.750 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      215    0.484    0.002    0.484    0.002 {built-in method read}
        1    0.185    0.185    0.185    0.185 {built-in method do_handshake}
      157    0.107    0.001    0.107    0.001 {method 'commit' of 'sqlite3.Connection' objects}
        1    0.102    0.102    0.102    0.102 {method 'connect' of '_socket.socket' objects}
    13585    0.049    0.000    0.049    0.000 {posix.stat}
     1010    0.034    0.000    0.126    0.000 traceback.py:281(extract_stack)
     1010    0.025    0.000    0.029    0.000 traceback.py:27(format_list)
      336    0.024    0.000    0.027    0.000 base.py:481(execute)
        1    0.023    0.023    0.029    0.029 {_socket.getaddrinfo}
    13418    0.019    0.000    0.067    0.000 linecache.py:43(checkcache)
      615    0.017    0.000    0.023    0.000 constants.py:25(_populate_html_description)
   109/74    0.017    0.000    0.252    0.003 {__import__}
  811/205    0.016    0.000    0.042    0.000 sre_parse.py:379(_parse)
        1    0.015    0.015    0.015    0.015 {_ssl.sslwrap}
    81754    0.011    0.000    0.011    0.000 {method 'append' of 'list' objects}
    13418    0.010    0.000    0.017    0.000 linecache.py:13(getline)
 1526/183    0.010    0.000    0.039    0.000 sre_compile.py:32(_compile)
      633    0.010    0.000    0.025    0.000 sre_compile.py:207(_optimize_charset)
    10397    0.009    0.000    0.011    0.000 sre_parse.py:182(__next)
       52    0.009    0.000    0.337    0.006 __init__.py:1(<module>)
        3    0.009    0.003    0.009    0.003 decoder.py:372(raw_decode)
     1317    0.008    0.000    0.012    0.000 models.py:433(get_rep_prescriptions_for_muscle)
       18    0.008    0.000    0.010    0.001 collections.py:288(namedtuple)
        3    0.007    0.002    0.010    0.003 sre_compile.py:301(_optimize_unicode)

Output

After running the logging in production I wrote a separate admin interface that would fetch and display the Dynamo data. You can see a sample block of output below where it’s much easier to trace through and identify problems in a massive slew of business logic.

Screen Shot 2015-03-21 at 9.34.46 PM

How it Could Be Better

Obviously, the paid services I didn’t use certainly have more features and filtering, but we’re maintaining a general theme here of side projects, small budget, and Chipotle Burritos. So the paid services clearly would have outperformed my little thrown together program.

For a little more than 5 Chipotle burritos, you could pay Heroku monthly to have a separate virtual machine as a worker, but I’m keeping the free tier, so all of my executed logic lives in the request/response cycle.

The obvious improvement then is to offload all logging to a task since nothing else depends on it and the required network calls will slow things down.