This post is really about type safety in general, something near and dear to my heart since it gets me so amped. While I normally develop in Python, it feels extremely luxurious to program in a language like Swift where the type system is so strict. If we have an object and we know its type and we know that’s it’s not None or Null, we’re free to act upon it without examining any of its data. This benefit is expanded to polymorphic entities where objects are subclasses of a parent class. Since all objects inherit from a shared class, all operations supported by the parent class are supported against the object.
Now, in some cases, this concept makes sense to apply to models that are saved into some persistence layer (a database). But then things just get weird.
SQLAlchemy, which I’ve been using at work, has its own method of storing polymorphic entities by writing each child entity to a separate table. Without considering too much of the merits of an implementation like this, it seems overly complex and too much happening at the persistence layer.
Recently I implemented my own version of polymorphic representation in a database, and I really liked it, so I wanted to go ahead and share…
But First: Representing Types in General
Before jumping ahead, one library I’ve been using a lot lately is the Schematics library. In essence, you can define a reasonably complex object structure, throw some data into the structure, and from there you can immediately validate that the input data matches the object’s type system. This can be useful if you’re reading or writing data to an external API or database. For a quick example:
So now with an object structure defined, we can either create the object from input data, or we can construct the an object ourselves and serialize it so that it can be written to a database or returned in an API:
Abstraction Layer in Front of Database
If you follow my writings vigorously, which you probably don’t, then you know that it’s a good idea to add an abstractoin layer between your application and whatever persistence layer it uses. Your application shouldn’t know about what version of SQL you’re using or if you’re even using SQL at all, and if you wanted to change your persistence layer then you should only need to make isolated changes. The same concept applies to 3rd party libraries or 3rd party interfaces to separate API’s. So the strict types above can be used as return types in these abstraction layers.
The point is simply: Whatever module interfaces with services that are not directly tied to your application should return these objects or something like it. They are pure objects that don’t know about anything else and can be serialized into primitive types which are universally understood.
Polymorphism in the Database
Now, to expand upon the previous ideas with a basic modeling concept: if an object looks like a duck, quacks like a duck, and does everything else like a duck, it’s a duck. Therefore, two distinct types will never be exactly the same, and this means that types can be inferred. So, it would be possible to create a table whose columns are made up of the union of all of our class’s and its subclass’s attributes. If each subclass has non-nullable fields, then given a database row we can determine its type based on which fields are populated or an exception can be thrown if the input row does not conform to any particular type.
Now, combine everything I’ve said and accept it as gospel. We can express those concepts in the code below, and we can add a mapper class that converts a row retrieved from the database into a particular type. The type is inferred by returning the first entity that correctly validates from the input row (but no more than one entity should correctly validate). I should point out that this is kind of slow, but at least the slow operations are happening in memory and are comparitively much faster than a database query.
Some Moar Thoughts on Inheritance
Somewhat tangent to the idea of representing multiple subclasses in a database, a few other thoughts:
A parent class with “Base” as a prefix in the name is a stupid name. This was one of my takeaways from Jeremiah “Boardgame Bruce” Lee. If you have a base class that’s not abstract, there’s nothing that inherently makes it “base” because all classes can be inherited from (one of the beauties of object oriented programming). If a class is “abstract,” however, this actually means something and denotes that a child class will work as intended if it implements the abstract methods and properties. This is similar to an interface or a protocol but can additionally provide default behavior.
In this case, schematics uses meta classes, so without some rigorous works I can’t actually make the parent class abstract. I find it expressive to name it “Incomplete” and use an underscore to denote that it is not meant to be a public facing class.