Introduction to metaclasses

This post describes my approach to explaining metaclasses. It is not about how to use metaclasses and its value is only educational. At the end of this post you should be able to define and reason about your metaclasses in python.

Introduction

Metaclasses are an instance of metaprogramming which is available in some programming languages. To discuss metaclasses we first have to be familiar with classes. Consider this class definitions in python.

class Spam:
	pass
class Egg:
	pass
class Ham:
	pass

These class definitions make the names Spam, Egg and Ham available to the program. You have written them as the text of your program and the rest of your program can access them but consider this example.

>>> todays_menu = [ Spam(), Egg(), Ham() ]
>>> todays_menu
[<__main__.Spam object at 0x7fc1ca822500>, <__main__.Egg object at 0x7fc1ca7e1180>, <__main__.Ham object at 0x7fc1ca788e20>]

This program is familiar. We have used the class names to create objects and placed them in today’s menu. Consider this other program.

>>> menu = [ Spam, Egg, Ham ]
>>> menu
[<class '__main__.Spam'>, <class '__main__.Egg'>, <class '__main__.Ham'>]

The difference is that we have not used the class names to create objects. This time we have the value associated to a class name.

How does this program goes from its text representation to its runtime representation? Is it by value substitution? Is it by reference?

The short answer is that the names Spam, Egg, Ham are available in the current namespace and their value is respectively <class '__main__.Spam'>, <class '__main__.Egg'>, and <class '__main__.Ham'>.

The runtime representation of the program was obtained by value substitution just like your everyday python. This is interesting because we have defined a name, value binding but the way we have done this is quite different from the everyday syntax name = value. What is going on?

Detour

Let’s take a brief detour to a very different world where you do not have classes and objects available in the language. Let’s go visit C.

Here is the initial example revisited. It’s not the same program and it does not work but at least it look similar. Another thing you may notice is that we are prevented from putting together in the same array different stuff. Which type should we put in place of ? in this program?

struct Spam {};
struct Egg {};
struct Ham {};

? todays_menu[] = {
	struct Spam{},
	struct Egg{},
	struct Ham{},
};

Let’s fix this. As all C types care about is the size of the thing we use an union to wrap them all.

union Choice {
    struct Spam {},
    struct Egg {},
    struct Ham {},
}

Choice todays_menu[] = {
	struct Spam{},
	struct Egg{},
	struct Ham{},
};

If you knew beforehand all the possible choices for a menu item you could do this encoding in C and achieve feature parity with our python example. But if you don’t know then you cannot. In C you cannot change the members of an union at runtime.

When you cannot achieve something using a feature of the language it is the time to make your own implementation of it with structs and functions. What you would need is a make_class function in conjunction with a make_object.

typedef struct { ... } *class_t;
typedef struct { ... } *object_t;

class_t spam = make_class("Spam", ...);
class_t egg  = make_class("Egg", ...);
class_t ham  = make_class("Ham", ...);

// equivalent to [Spam(), Egg(), Ham()]
object_t todays_menu[] = {
	make_object(spam, ...),
	make_object(egg, ...),
	make_object(ham, ...),
};

// equivalent to [Spam, Egg, Ham]
class_t menu[] = {
	spam,
	egg, 
	ham,
};

This achieves feature parity for all we care. Now we can discuss the interesting part.

make_class and make_object are functions and are usually computed at runtime. This enables you to write object oriented code in C. As we would like to show off to those pythonistas we can even make some macros to emulate the syntax.

#define CLASS(binder, class_name, ...) class_t binder = make_class(ENVIRONMENT, class_name, ...)
#define NEW(class_name, ...) make_object(ENVIRONMENT, class_name, ...)

CLASS(spam, "Spam", ...);
CLASS(egg, "Egg", ...);
CLASS(ham, "Ham", ...);

object_t todays_menu[] = {
	NEW(spam, ...),
	NEW(egg, ...),
	NEW(ham, ...),
};

class_t menu[] = {
	spam,
	egg,
	ham,
};

We now bid C farewell. It served us right and we have this idea that a make_class and make_object functions are all we need. But how to do anything like this in Python? What is the make_class equivalent and is make_object the __init__ method I have to write all the times?

make_class for python

For all our purposes make_class is replaced by type. You may have used type before to inspect an object and find out what class it was created from but today we use it to create classes. If we replace the class definitions using type we can achieve feature parity.

>>> Spam = type("Spam", (), {})
>>> Egg = type("Egg", (), {})
>>> Ham = type("Spam", (), {})

>>> menu = [Spam, Egg, Ham]
>>> menu
[<class '__main__.Spam'>, <class '__main__.Egg'>, <class '__main__.Ham'>]
>>> todays_menu = [Spam(), Egg(), Ham()]
>>> todays_menu
[<__main__.Spam object at 0x7fe887f66500>, <__main__.Egg object at 0x7fe887f25180>, <__main__.Ham object at 0x7fe887eeceb0>]

We have now answered the first question and found that there is an alternative to define classes in python. This alternative uses ideas from everyday python such as variable assignment and function call to define classes but we still have much to explore. Here is a short list of unanswered questions.

How do I define attributes?
How do I define methods?
How do I define a subclass?

These questions are immediate given our familiarity with the python class syntax. Since we have been taught the class syntax we have put attributes and methods in the class body and the name of the super class in parenthesis after the class name.

class Dog:
    good_boy = True
    def bark(self):
        print("Woff!")

class Shiba(Dog):
	pass

How do we translate this examples in the function and variable assignment style?

>>> def bark(self):
        print("Woff!")
>>> Dog = type("Dog", (), {"good_boy": True, "bark": bark})
>>> Shiba = type("Shiba", (Dog,), {})

As you can see there are two extra arguments to type that we ignored until now. The second argument defines the classes from which to inherit and the third argument maps the name bindings in the class body to a dictionary.

This is all we need to define classes at runtime. ¹

the next step after type

Not everybody wants to only define classes at runtime using type. People have used the class syntax for a long time now and we can’t sweep it up from under their butt withouth them noticing. There may be people that are downright opposed to defining classes at runtime and we would like to accomodate them all.

To do so we have to make it possible to go from python’s class syntax to our function call convention. We will do this first with an external class syntax that python does not recognize and then we will acknowledge the internal class syntax.

We start with defining classes in JSON. This will limit what we can do but it will be enough for our example. Because we want to do something more other than calling type we will also add an attribute representing the JSON source to the new class.

>>> import json
>>> class_def = '''
{
	"name": "Dog",
	"bases": [],
	"kwds": {
		"good_boy": true
	}
}
'''
>>> def make_class_from_json(source):
        name = source["name"]
        bases = source["bases"]
        kwds = source.get("kwds", {})
        kwds.update({"__json_source__": json.dumps(source)})
        return type(name, tuple(bases), kwds)
>>> Dog = make_class_from_json(json.parse(class_def))
>>> Dog.good_boy
True
>>> Dog.__json_source__
'{"name": "Dog", "bases": [], "kwds": {"good_boy": true}}'

This example’s interesting part is not the fact that we are defining a class in JSON notation but the fact that now we have an alternative to type and the class syntax. As long as we have a procedure from an input source to type we can define a class. We have abstracted this procedure with a function so that we can reuse it as we please and this is the interesting bit.

Python’s internal syntax

In our previous example we have started with a description of the target class in an external syntax ², we have parsed it and fed the result to type. Apart from our choice of external syntax this is exactly what python does when it encounters a class definition. It parses the syntax, finds name, bases and class body and feeds the result to type after evaluation.

If we had a procedure to take in python syntax we could write our class definitions in python, parse them and feed the result to type. Just like python does it!

While we could rig up something with the ast module we are after a bigger fish. What if we could specify a class using the class syntax and then ask python to please use our procedure instead of type? The example practically writes itself.

def make_class(name, bases, **kwds):
    # here we can do anything, e.g. update the kwds dictionary
    return type(name, bases, kwds)

class Dog(???=make_class):
    good_body = True

The intersting bits of this example are two:

We define a procedure to create python classes. ³
We ask python to use our make_class procedure, basically hijacking python’s everyday life and swapping it out for something that we wrote.

I’m not sure which bit is more mind-blowing. The first is very pleasing to the mind but the second really screams for attention as we wrangle python in doing our bidding even when we use python’s own syntax.

To make a very long story short it’s not common to use functions for this use case. The supported way to implement it is to go through classes and methods.

The zoo of metaclasses

The simplest example we can make reuses type as a metaclass. While it does nothing different than python’s builtin metaclass does, it shows how to use the metaclass keyword in a class definition.

class Dog(metaclass=type):
	good_boy = True

Once we have this we can finally give our metaclass a name.

class Meta(type):
	pass

class Dog(metaclass=Meta):
	good_boy = True

Next we start writing the method used by the python interpreter when it will use our metaclass Meta.

class Meta(type):
    def __new__(cls, name, bases, class_dict):
        print(name, bases, class_dict)
        return super().__new__(cls, name, bases, class_dict)

class Dog(metaclass=Meta):
	good_boy = True
	def bark(self):
		print("Woff!")

When we feed this example to the python interpreter you will see an output similar to this.

>>> class Dog(metaclass=Meta):
        ...
<class '__main__.Meta'> Dog () {'__module__': '__main__', '__qualname__': 'Dog', 'good_boy': True, 'bark': <function Dog.bark at 0x7f69daeca0e0>}
>>>

Now is the moment to tie together everything we discussed. There is not much difference between the __new__ method, type and our own make_class. They all receive a name, a list of base classes, and a dictionary of class properties. Moreover it does not look like Meta has any interesting properties. The __new__ method is basically static and we use the name Meta as a namespace. For all we care Meta.__new__ is the function make_class using the internal python syntax.

Conclusions

We have finally reached the end of this post. If you have understood everything you should be able to understand how to go from a class definition to a class object using the following.

type and variable assignment.
The class syntax native to python.
Using a class implementing __new__ and the metaclass property.

You should also have understood that metaclasses in python are basically functions from the internal syntax to an object by way of (eventually) type.__new__.

This means that you have at your disposal the full power of python while creating a class. This makes for a lot of power but it is unclear what to do with it now. My interpretation of this confusion follows.

The confusion comes from an intuitive division for your program’s lifecycle. In your mind the program goes through a static phase and a dynamic phase, e.g. is first read and then executed. Metaclasses instead pierce through this division and reveal that it was a continuum all this time.

Now we just have to convince everyobdy with ditching the class syntax and write function calls to type everywhere. ↩
external and internal syntax are with respect to python. As JSON is not a syntax for python programs it is external. ↩
Using python! so meta! ↩