Caching

True Story Follows

One of the most fun things about working at Uber is that making your code fast and performant is more than just a matter of satisfaction. Your code is constantly being run 24 hours a day with enough load to warrant multiple machines. And thefore faster code means fewer required machines and in my case better scalability overall.

Caching is especially interesting because on the surface it seems easy: store the results of an expensive database query or network call somewhere so that multiple redundant calls aren’t made. However, ensuring your data isn’t stale requires that the cache is busted at the appropriate time, and now the problem of caching suddenly becomes very difficult.

Caching Results

Before going into code details, here are a few graphs that paint a picture of how valuable caching can be:

Screen Shot 2016-01-25 at 6.05.15 PM
The above image depicts the last time my service was queueing – meaning my processes that were acting as celery workers could not keep up with the input load. Ideally, there is little no queueing and my service is responding to real time events as fast as possible. Interestingly enough, this is depicting the application when it was in a state where Redis was being used heavily to retrieve data and not even to disk. The peak of the graph and the positive slope immediately before represent the time period where a deploy was happening, and thereafter depicts the time period where new changes were implemented to backfill cached data into memory in addition to what already existed in Redis.

The fruits of the change are immediately visible here, but this slight change which was really only about 10 lines of code was further manifested afterward. A huge load which was previously placed on Redis was now lifted entirely to individual parallel processes, and therefore adding more workers with a few clicks of a mouse brought about true horizontal scalability.

A more simple caching strategy was used for network calls. My use case is pretty easy to imagine: a user comes online and does some things, goes offline, and maybe will return in a week or two. For any piece of data I ask for, it’s fairly safe to use that data’s value as the source of truth for the duration of the user’s transactions which may last for a few minutes. In other words, for any network call I make, I can use a simple caching strategy where the results are stored but simply expired 30 seconds later. In the real world 30 seconds is trivial, but in the computer world 30 seconds might as well be several centuries.

This does several profound things:

  • Dramatically improves performance, both in my application as well my neighbors’ applications, since any redundant network call is effectively eliminated.
  • Maintains a simple approach with a simple busting strategy that’s quite hard to go wrong.
  • Keeps the code clean since network resources can still be referenced as a single point of truth, and higher level callers don’t have to worry about the cost of using a resource. This means I can program in a functional style where everything stays stateless with effectively idempotent interactions and no side effects and therefore minimal bugs.

You can see my own cache hits and misses depicted here, where the largest line indicates cache hits and what would otherwise have been slow network calls.

Screen Shot 2016-01-25 at 6.08.48 PM

A Practical Cache

In an interview, you might be asked to implement a least recently used cache, or if you’re unlucky and got stuck with “Doc Ak” Shah, you might be asked to implement a least frequently used cache, and in either case the cache needs to be busted in O(1) time. In practical terms though, you can forget most of that and just do one of two things:

pip install redis
pip install repoze.lru

Where Redis is the ubiquitous data structure server and repoze.lru is a random python package I found on the internets that implement a least recently used cache for me. Now you’re left with only the following concerns:

  • Serialization
  • When to bust the cache
  • How to efficiently bust the cache

Interestingly, all 3 of the above concepts end up relating to one another in the sense that clean code generally solves all three.

Serialization

If we’re being pragmatic, we’re using Redis, and this means we can’t just store objects in memory. We must serialize and deserialize objects so they can be stored. If we represent objects as dictionaries, we know how to serialize them because they easily convert to and from JSON. If we’re hardcore and using something like Thrift, we know how to serialize and deserialize those objects as well because those entities are converted to binary strings when passed across the network. If we’re using a 3rd party library like Schematics, serializing and deserializing those entities is even more trivial because that’s the general purpose of the package.

We’ll assume the obvious case for an example: we want to cache database queries so we’re not repeatedly seeking out to disk. If you agree with my immediately preceding paragraph, then we’ll want to convert database rows into something easily serializable. Therefore, we’ll want to ensure that our code is structured such that an abstraction layer always exists immediately above the database. This is beneficial for the purposes that I was getting at, but to go on a brief tangent, this helps to ensure that queries are explicitly read-only or explicitly writes, it makes our choice of persistence layer trivial and an afterthought, and generally adds to a clean code structure. Now rather than returning rows directly from the database, we return serializable entities that otherwise correspond to rows.

So in the complex caching library you might end up creating, you might want to have an abstract serializer where subclasses correspond to the types of objects you intend to serialize. In my case, I created serializers for primitives (like integers, floats, and booleans), serializers for JSON data structures, serializers for thrift objects, and a “null” serializer that ended up corresponding to in memory storage. Serialization is an independent concern from your actual caching logic.

When to Bust the Cache

If you’re following the strategy laid out above, then your network calls or database queries are not strewn about in random places. They’re isolated to individual classes and converted to bags of methods. In a clean code structure then, you just need to isolate where writes are happening among those methods, and only in those places do you need to bust the cache.

Boom. Done

How to Efficiently Bust the Cache

A naive approach to cache busting would be to maintain a secondary index that stores the keys associated with your cache. To bust the cache, delete all the values associated with each key. However, if we have a million items cached or whatever, this will be expensive and might defeat the purpose of caching in the first place. If we’re following my example to a tee, you’re either using Redis or a 3rd party library that has an expiring least recently used cache. Therefore, you can bust the cache in O(1) time simply by changing the keyspace. The keys that become obsolete will simply expire over time. And while this approach can be expensive from a memory standpoint, in all reality if you’re a web developer then for all practical purposes you have an infinite amount of memory (consider that a production machine might have 256 GB of RAM when back in the 70’s a computer might have a few kilobytes of RAM).

In a simple example for a single process, our cache might look something like this:

def __init__(self):
    self.current_state_id = uuid.uuid4()

def _key_for_state(self, original_key):
    return "%s:%s" % (self.current_state_id, key)

def bust(self):
    self.current_state_id = uuid.uuid4()

def get(self, key):
    key = self._key_for_state(key)
    # simple get logic
    pass

def put(self, key, value):
    key = self._key_for_state(key)
    # simple put logic
    pass

In this sort of setup, we can just get() and put() at will and assume that anything fetched from the cache is valid if it’s available. Now if we want to go to a more real world example, we’ll consider that we probably have multiple processes doing the work and sharing the cache, so some complexity will be added.

The storage mechanism will change to Redis, which is fairly trivial, but more importantly, the state of the cache needs to be stored in Redis as well, and that will be referenced as part of your key that’s inclusive of the cache state.

Now, in order to bust the cache, only one process should be allowed to bust the cache at a single point in time. That way any process can know if its version of the cache is stale. My preferred method for dealing with a mutex is to use Redis again to create a “distributed lock”, if you will (and I will). The redis documentation outlines step by step how to create a distributed lock. Just follow the instructions, add some tests, and you have a tool that has tons of practical applications.

Storage

It’s worth enumerating storage specifically because it should be noted that it’s a completely separate concern from anything above. I’ve mentioned Redis as a storage mechanism, but I’ve also mentioned Redis for 2 other distinct purposes (locking and sharing a keyspace), but that doesn’t necessarily mean that Redis should be your assumed storage mechanism for the simple reason that it’s an independent concern.

If you separate serialization, storage, and caching logic as independent concerns and write your code as such, you’ll have very clean interfaces to build on top of. In my case, I had a separate class just for writing to and reading from the cache, and it became trivially easy to backfill valid data in Redis back into memory. The result was that I have some insanely complex datastructures that are stored in memory that can be referened by a simple pointer without having to serialize anything or process all of the contents of that datastructure. If you imagine that I have 50 workers, I effectively have 50 copies of the cache that lives respectively within each process. And again, we have for all practical purposes an infinite supply of memory.

As an example, I have entities that when serialized to JSON can end up being thousands of lines. To read that object from Redis requires that all of the contents are read, but to read the object from memory simply means that I reference the initial object.

Magnitude Comparison

Magnitudes:

### Minute:
    L1 cache reference 0.5 s One heart beat (0.5 s)
    Branch mispredict 5 s Yawn
    L2 cache reference 7 s Long yawn
    Mutex lock/unlock 25 s Making a coffee

### Hour:
    Main memory reference 100 s Brushing your teeth
    Compress 1K bytes with Zippy 50 min One episode of a TV show (including ad breaks)

### Day:
    Send 2K bytes over 1 Gbps network 5.5 hr From lunch to end of work day

### Week
    SSD random read 1.7 days A normal weekend
    Read 1 MB sequentially from memory 2.9 days A long weekend
    Round trip within same datacenter 5.8 days A medium vacation
    Read 1 MB sequentially from SSD 11.6 days Waiting for almost 2 weeks for a delivery

### Year
    Disk seek 16.5 weeks A semester in university
    Read 1 MB sequentially from disk 7.8 months Almost producing a new human being
    The above 2 together 1 year

### Decade
    Send packet CA->Netherlands->CA 4.8 years Average time it takes to complete a bachelor's degree

After you have a solid caching strategy in place, it becomes trivially easy to memoize the results of any stateless function, and you can quickly enjoy the benefits of avoiding increasingly costly hardware layers. When you start getting crazy with all sorts of caching logic, it starts to feel ironic how several hundred more lines of programming instructions will end up making your whole program magnitudes faster. As in the above comparison, if we consider that hitting the hard drive as compared to executing individual instructions is like comparing years to seconds, it’s completely worth “putting the computer to work for a few hours” to continue the analogy.