Akka actors case study: a multiplayer games backend

Today I'll be sharing the high-level architecture and design features of a project I wrote back in 2018 using Akka actors. This is a reasonably complex project, and does a decent job of illustrating several points of how to build generic, reusable actors to perform useful functions.

This article is the first of a series where we'll look in more depth at individual components of this design. In this article, we'll look at the high-level project design, how we break down a complex problem into simple, reusable components, and how to structure and build the solution in Akka at a high level.

A multiplayer games backend
High-level structure
Sharding
HTTP interface
Take aways

A multiplayer games backend

Firstly, the project goal. A few years ago, a major passion project involved creating multiplayer games which could be played on my website. I had a few of these of varying complexity, from a slightly wacky "enhanced" version of snakes and ladders, to a Yahtzee-inspired "rainbow dice" game, to a complicated card game, Tichu. The latest iteration of these games, around 2016-2018, involved some javascript frontend work hosted on my website, and a backend written in Scala with Akka actors. We'll be looking at the backend.

The games have a fair bit of variety, but they were all online multiplayer, turn-based board or card games, so they all worked with a fairly similar model:

Create a lobby to let players join
Start the game with players currently participating in the lobby
Decide turn order
Let players make moves, in turn order, and update the game state according to the results of each move
Publish player moves to all players (sometimes with private details hidden)
Determine when the game was over and who won
End the game, and reward the winner(s) somehow

The details of the latter steps in particular vary greatly from game to game, but clearly there's a lot of common ground here even between very different games.

The basic design, then, involved firstly designing some actors which could generically perform common roles, like defining a lobby and the logic around when it's ready for a game to start, who has permissions to start the game and/or invite people, and other "lobby" functionality. Similarly, enforcing turn order happens with the help of the TurnController which we insert in front of our game logic to "guard" it from out of order actions, duplicate actions, and races. We also need a feed of actions which are happening, both to inform users and to allow us to adopt an event-driven approach.

Along with those pieces common to all games, we define specific actor relationships and state models for the specific game, and define the logic of our game: which actions can be taken, when does the turn end, what does our scoring and our game board look like, when is the game over?

Ultimately this needs to reach the user in the frontend, so we interface our internal actors with an HTTP interface, providing some endpoints for checking game state and taking actions, and providing a websocket feed to subscribe to real time events.

High-level structure

One excellent feature of Akka is the ability to easily draw what we're building, since we're creating actors which will talk to each other in much the same way as services might talk to each other. At a high level, we can look at how our top-level actors communicate with each other to handle setting up and tearing down games, and at a lower level we can zoom into child actors and see how they work as well.

Let's first take a look at a simplified structure of a game:

In this diagram, each component is an actor or group of actors, and is fully encapsulated. They perform specific, reusable functions and communicate with other actors. Let's take a look at some of the components we need to make this work.

The game daemon

This is the top-level manager actor for one type of game, and deals with creating individual games and holding their references, and providing an interface to manage them. This is what the user will be talking to when creating an initial game lobby or connecting to a game by ID. It supervises all games as well, attempting to recover or replace any lost actors.

When a game is created, it'll spin up a child GameFramework actor, which then handles everything to do with the game.

Ultimately we serve an interface for multiple games, where each has a slug and a daemon actor, so that we can direct our queries to the snakes-and-ladders daemon or the four-in-a-row daemon as appropriate.

The game framework

This is the parent actor encapsulating everything needed for one instance of a game, throughout the lifecycle. It spins up children for all of the components you see beneath it in the diagram. This is a generic component, as every type of game needs a feed, a lobby, and an actor to deal with the game logic; we'll swap out that last piece per game. This actor will perform all the high-level housekeeping functions for any type of game.

The framework understands the lobby and game lifecycle, stores game participants, controls how the feed is hooked up to other components during the lifecycle, and acts as the parent to every other component. This means it'll supervise everything beneath it and handle tasks like ensuring games are torn down if left idle.

The framework is created when the user requests to start a new game by talking to the game daemon, ultimately by making an HTTP POST request like /game-type/create and receiving back an ID to interact with the instance. It sets up the initial components and hooks up the lobby to the feed. Once the lobby completes and the game start is requested, it ensures the game itself is initialised and allows the game to take over control of user interaction instead.

The lobby

The lobby is the initial "room" before the game starts, where we start collecting participants and get ready to launch the game. It provides several few pieces of functionality:

Allow users to join and leave
Limit the number of users who can be in the lobby
Provide invite / kick / ban functionality to authorised users
Respect minimum / maximum participant limits
Tell the feed about join / leave events, or that the lobby is complete and the game is starting
Send a summary of the final lobby state to the GameFramework parent when it completes, and kill itself

The lobby lifecycle begins when the parent GameFramework begins, and ends when the game is set to begin; the lobby yields a set of users who will be playing, and the game will be initialised with those users.

The feed

The feed primarily exists as a broadcast mechanism, allowing subscribers to register when a user connects via websocket. It also assists in adopting an event-driven approach to our game logic, so that in event of failure, we can recover the game state by replaying the feed into the game. There are a few considerations here:

a user may join late or need to refresh and reconnect, so we need to be able to flexibly receive the feed from the beginning, from the end, or from a particularly ID the user last saw.
events may need to be masked before sending out to users; e.g. when a user draws a card from the deck, the feed should not tell everyone what card was drawn
as for the events themselves, they should always at least have an ID, an idea of what is public vs private data to mask, and to aid in processing messages in the frontend, an idea of what event "caused" the next one. Several events are also common across all games, for example GameStarted.

This means we have a few pieces of logic to make our feed really useful, as well as needing a solid Event model which games can extend, or draw from a pool of common definitions.

The game logic

The game component will itself be an actor, and encodes all the logic of the game. For a simple game like Snakes and Ladders, this may be a single actor which holds all the game state and responds to a couple of potential actions to advance the game. For a complex card game like Tichu, there may be multiple phases to a round, player hands and scores to track, and therefore it will spin up and shut down various groups of child actors to deal with the internals. In most cases it will place itself behind a TurnController to avoid having to worry about whose turn it is, so this behaviour is provided as standard.

The turn controller

This component encapsulates the common logic of guarding the game from out-of-order actions and deciding whose turn it is, and is used by any turn-based game. It has the following behaviours:

it will intercept all actions and pass them through if and only if the action was taken by the user who currently has a turn
it will deal with race conditions by locking itself once an action is passed on, and only unlocking itself once the game logic confirms the next action to take: advance the turn, jump to a new turn, or unlock and continue the same turn. While locked, no actions will be permitted at all, so we require an acknowledgement before continuing.
it is given a set of participants and will decide the initial turn order
it can be told when the turn should be advanced, and if the game requires it, jump to a specific user
players can be marked as out of the game to remove them from the current turn order

This effectively makes the TurnController a proxy which both decides and enforces turn order, and can be configured by the game to customise its behaviour when needed. The actor is inserted between the interface and the game and this prevents the game having to worry about turn order or races.

Sharding

Since these multiplayer games have finite player limits, the service can be scaled up with only very limited exposure to the drawbacks of distributed computing. We can pick a sharding strategy which distributes our individual GameFramework actor groups across our nodes, meaning that everything internal to the game will always take place on a single node, and we don't risk dropping regular messages during the course of the game.

This means that only the top level GameDaemon will need to send messages between nodes of the application, so message delivery failures will happen early: only a game creation or an attempt to connect to a feed can fail, and both can be retried safely. Akka cluster provides the functionality we need to send messages to actors on different nodes, but it does unavoidably have weaker message delivery guarantees than those passed between two different actors in the same JVM.

HTTP interface

The heavily-structured nature of the actor hierarchy corresponds nicely to a RESTful API with structured URLs. Without going into too much detail, a progressive structure like game/snakes-and-ladders/12345/action/roll/ descends the logical actor hierarchy to deliver actions to a specific game. This makes the route structure fairly transparently reflect the actor system structure and aids in understanding and debugging. Since players also need realtime updates on what's going on in the game, users subscribe to a websocket feed like game/snakes-and-ladders/12345/feed to receive continuous updates on what's going on.

This interface is served via akka http, and after performing authentication and parsing and validating any JSON actions delivered, communicates with the underlying actor system to deliver the message.

Take aways

A lot of what I've discussed here is a highly-specific use case for akka actors, but hopefully some of the designs here can be used as inspiration for other challenging projects. Here are some takeaways for designing your own actor systems:

Building an actor system makes it easy to draw and reason about the inner workings of the application, and in turn helps you break down a large, complex problem into a series of smaller building blocks
You can stay DRY in an actor system by encapsulating shared functionality into an actor or hierarchy of actors which provide a key service to others
High-level architectural design is crucial to breaking down a difficult problem
Remember that sending messages across network boundaries with Akka Cluster comes with performance and reliability costs, so effectively sharding an application to minimise those boundaries is worthwhile

In future articles, I'll dive into several individual components in more depth to see exactly how we build and test actors with the behaviours described here.