Put some CLOS in your ECS

Recently this piece on ECS has been on my mind. To finally kick it out, here are my notes on what can be improved.

The author uses Common LISP to define an extended language for writing program that favour the ECS style. While it is a very nice read and it achieves what was intended I think this approach can be improved upon. The main problems I see are the following: 1. this is not LISP anymore, 2. it reimplements redefinition of data at runtime.

The first problem is solved by accepting that this was an exploratory solution. The extended language is a prototype and not the final API to actually write programs using the ECS. The second is instead an implementation problem.

My take on both these problems would be to use the Common Lisp Object System (CLOS) as it solves both. Once we teach CLOS how we want to lay out our memory we can get data locality and we have not extended the language for non-control-flow reasons. Moreover CLOS already supports redefinition at runtime therefore we are not duplicating features and the second problem goes away.

A tour of ECS and CLOS ECS

Here’s an example of the current API.

(ecs:define-component position
  "Determines the location of the object, in pixels."
  (x 0.0 :type single-float :documentation "X coordinate")
  (y 0.0 :type single-float :documentation "Y coordinate"))

(ecs:define-component image
  "Stores ALLEGRO_BITMAP structure pointer, size and scaling information."
  (bitmap (cffi:null-pointer) :type cffi:foreign-pointer)
  (width 0.0 :type single-float)
  (height 0.0 :type single-float)
  (scale 1.0 :type single-float))
  
(ecs:define-system draw-images
  (:components-ro (position image)
   :initially (al:hold-bitmap-drawing t)
   :finally (al:hold-bitmap-drawing nil))
  (let ((scaled-width (* image-scale image-width))
        (scaled-height (* image-scale image-height)))
    (al:draw-scaled-bitmap image-bitmap 0 0
                           image-width image-height
                           (- position-x (* 0.5 scaled-width))
                           (- position-y (* 0.5 scaled-height))
                           scaled-width scaled-height 0)))

This is the CLOS flavoured version of the same API. It is good to see the two side by side to spot the differences.

;;; Our components are defined as classes and inherit from the
;;; ecs-component class. This means we can reuse the slot definition
;;; syntax from CLOS or extend it easily without resorting to macros.

(defclass position (ecs-component)
  ((x
    :initarg :x :accessor x :type single-float
    :documentation "X coordinate, in pixels.")
   (y
    :initarg :y :accessor y :type single-float
    :documentation "Y coordinate, in pixels."))
  (:documentation "Determines the location of the object, in pixels."))

(define-class image (ecs-component)
  ((bitmap
    :init-form (cffi:null-pointer) :accessor bitmap :type cffi:foreign-pointer)
   (width
    :init-form 0.0 :accessor width :type single-float) 
   (height
    :init-form 0.0 :accessor width :type single-float)
   (scale
    :init-form 1.0 :accessor scale :type single-float))
 (:documentation "Stores ALLEGRO_BITMAP structure pointer, size and scaling information."))

;;; The "systems" is just function calls in our implementation.
;;; As we loop over two components we implement the loop body
;;;  as a function for clarity.
(defun draw-image (position image)
  (let ((scaling (image-scale image)))
     (let ((scaled-width (* scaling (image-width image)))
           (scaled-height (* scaling image-height)))
        (al:draw-scaled-bitmap image-bitmap 0 0
                               image-width image-height
                               (- position-x (* 0.5 scaled-width))
                               (- position-y (* 0.5 scaled-height))
                               scaled-width scaled-height 0)))

;;; Here is the final piece, a single function to call every time we want
;;; our component to render to screen.
(defun draw ()
  (with-component* (position image) draw-image
                   :read-only t
                   :initially (thunk (al:hold-bitmap-drawing t)
                   :finally (thunk (al:hold-bitmap-drawing nil)))))

If this reads like normal LISP code to you¹ then we have achieved our first objective. Our programs do not need to bend too much to allow us to benefit from the ECS architecture.

But how does this work?

We define two classes, position and image, with the standard syntax for classes.

(defclass position (ecs-component) ...)
(defclass image (ecs-component) ...)

We also specify both as inheriting from the ecs-component. This will take care of our requirement for spatial locality as we will make the instance creation takes memory from a common pool to the class.

The following snippet shows you how we allocate two positions object. Because they are an instance of an ecs-component their allocation goes through our library-defined allocation scheme which means we can allocate them as successive cells in an array. We have achieved data locality.

;;; these tow instances of position will be contiguous in memory
(make-instance position :x 0.0 :y 0.0)
(make-instance position :x 1.0 :y 1.0)

The draw function takes a reference to the underlying storage for our positions and images, it starts a loop over the two together and calls draw-image. Before the loop it executes the thunk (al:hold-bitmap-drwaing t) and when the loop exits it executes the thunk (al:hold-bitmap-drawing nil).

(defun draw ()
  (with-component* (position image) draw-image
                   :read-only t
                   :initially (thunk (al:hold-bitmap-drawing t)
                   :finally (thunk (al:hold-bitmap-drawing nil)))))

It also takes the two references as read-only so that any attempt at modifying them or their data will error.

But why does this work and why is CLOS required?

The class ecs-component will receive the class slots from position and bitmap at class definition time. It defers class creation to its metaclass ecs-component-meta.

(defclass ecs-component (...)
  ...
  (:metaclass ecs-component-meta))

(defclass ecs-component-meta (...)
  ; hic sunt dracones
  ...)

The job of the metaclass is to prepare the underlying storage for the classes defined. This is exactly what we need to have all instances of the same component share one big contiguous array.

We have to use a metaclass because we need to know which slots are defined to decide how to lay out in memory the structure for our class and from that define all the procedures such as getters and setters.

Moreover once we have allocated the array for the class instances we also want to use it in our loops. As we need to reference these arrays we can also define some more names, e.g. ecs-positions and ecs-images, that we can reference later. This is how our macro with-component* is able to get the two underlying arrays and other metadata, e.g. counts of each components. There are various ways to implement this and it is not important right now.

Finally we are at the end of our example. Data is layed out in memory as we want and we are ready to loop over it as fast as we can.

(defun draw ()
  (with-component* (position image) draw-image
                   :read-only t
                   :initially (thunk (al:hold-bitmap-drawing t)
                   :finally (thunk (al:hold-bitmap-drawing nil)))))

For each component passed to with-component* we get the reference to the underlying array and the metadata. Then we loop over the elements of the arrays and call draw-image for it’s side-effects.

I hope that this has convinced you that we can achieve the same benefits of the ECS pattern without both downsides I pointed out.

We don’t have to change the way we write our programs to obtain data locality because we can extend LISP in two directions which are orthogonal: we can extend the syntax and we can change the data representation without altering the syntax.

We can extend the syntax of the language when it makes sense to add more specialized control-flow. Our with-component* is implemented as a macro and is not a glorified for loop but specialized control flow.

As it takes a function to execute in a loop we have to also account for non-local control flow that can happen in the function body. Moreover it allows us to set up the read-only feature, which can be enabled for just a subset of the components!

The next extension is instead in the orthogonal direction of the data representation. This is why we should use the CLOS and all its facilities as it provides fine-grained control over all the decisions about data representation.

We could have implemented a sqlite storage for our instances and it would have worked the same.

Moreover the CLOS already provides facilities for classes redefinition! You don’t have to figure out how to implement a component redefinition if you follow the CLOS specification.

Arenas

As a bonus to this tour here is something that is missing in the original presentation: arenas.

Arenas are a specialization of this allocation scheme where the user supplies the underlying storage instead of delegating it to the ECS library. For very hot sections of your code where you create lots of objects to immediately throw them away it makes sense to reuse the space or to just throw it out in block.

This is easily implemented because our approach defines the method make-instance for the classes that inherit from ecs-component which means we can take additional parameters.

(make-instance position :x 0.0 :y 0.0 :arena frame-positions)
(make-instance position :x 1.0 :y 1.0 :arena frame-positions)

This is the holy grail of data-locality: give the user the choice to override your default when they know best.

Drawbacks

This CLOS flavoured approach has drawbacks. Implementing an ECS style object system in CLOS using metaclasses is doable but you will have to read the documentation and get informed about all the possible interactions. There is a multitude of things to learn about the CLOS implementation which will make your head spin! It may be too much of a task for some exploratory programming which is definitely why you should not start there.

For the less parenthetically inclined this can also be unreadable. After all it is LISP. ↩