Data models are an important design decision in any application, and serialising your data is a key capability most will need in order to communicate with other services. JSON is among the most popular data formats used in APIs regardless of language, so naturally having a good grasp of how to (de)serialise your models to and from JSON is important. Today I'll focus on how to model your JSON data with case classes, and how to handle JSON serialisation. Note that the same principles apply regardless of format, though, and I have used all the same techniques to serialise to/from XML and others as needed.
Unfortunately, JSON libraries in Scala are a dime a dozen, and any 3 Scala developers will have 4 favourite libraries to use. Fortunately, the guiding principles of almost all of these libraries are very similar:
String
to and from a JSON AST (Abstract Syntax Tree) provided by the library, and from
that AST to/from your case class modelsInt
, Double
, String
, List[T]
,
Map[String, T]
out of the boxThere are several libraries available which all follow this basic design, including Spray JSON, Play JSON, and Circe. I'm going to be focusing on Spray JSON, but the principles I cover will apply regardless of your choice.
For the purposes of this post, I'll be dealing with modelling some arbitrary JSON which has been provided by an API we don't control. This means we're focusing on building models which directly represent that data, in some form that's relevant to our imaginary application. We'll leave the issue of transforming or combining this data with other internal models for another day and just stick to representing our JSON payload.
This means that we can't change the JSON – and we'll consider cases where the JSON structure is less than ideal, and how to handle that gracefully without creating internal models which are equally ugly.
Let's first consider a fairly simple payload about library books:
{
"isbn": "9780155658110",
"title": "Nineteen Eighty-Four",
"author": "George Orwell",
"lastCheckedOut": "2020-05-02T09:23:11Z",
"pageCount": 450,
"reviewRating": 4.6
}
And a sensible model representing this data:
case class Book(
isbn: String,
title: String,
author: String,
lastCheckedOut: Instant,
pageCount: Int,
reviewRating: Double
)
Nothing too complex here, we've largely just picked Scala primitives matching the JSON primitives, with the
exception of lastCheckedOut
, where we want to parse a Instant
. Since dates can come in many formats,
we'll need to provide a format for Instant
before Book
to be able to parse this:
import java.time.Instant
import java.time.format.DateTimeFormatter
import spray.json._
trait BookJsonProtocol extends DefaultJsonProtocol {
implicit val instantFmt = new JsonFormat[Instant] {
private val formatter = DateTimeFormatter.ISO_OFFSET_DATE_TIME
override def write(instant: Instant): JsValue = JsString(formatter.format(instant))
override def read(v: JsValue): Instant = Instant.from(formatter.parse(v.convertTo[String]))
}
implicit val bookFmt = jsonFormat6(Book)
}
object BookJsonProtocol extends BookJsonProtocol
Since our JSON fields all have fairly sensible names and our types are largely simple, our case class format
is simply jsonFormat6
, having first provided a JsonFormat[Instant]
. This format is slighly more direct,
to specify ISO8601 datetime format, but still quite straightforward.
Note: We've followed Spray JSON's guidelines of defining a
JsonProtocol
for our formats and packaging it separately, keeping our type classes separate from the models themselves. In a slightly more complex scenario, we'd go a step further and move ourInstant
format into aTimeJsonProtocol
which we extend, and perhaps make it more generic for different time formats.
Some libraries suggest instead including your formats on the models' companion objects, but this places your serialisation logic into your models, which type classes typically exist to avoid. Note that this also would not be possible for types beyond your control, such as
Instant
, and you'd have to place them elsewhere regardless.
Depending on what's being modelled, you may also choose to use stronger types for some fields. For
example, an ID field representing a particular type of ID might be best represented by its own type,
especially if there are specific formatting concerns. That could be as simple as a wrapper class like
case class Isbn(value: String)
, which primarily acts to distinguish the type in your code when ISBNs are
used as keys frequently. But it may be more complex depending on encoding: an ISBN is actually comprised of
multiple different subgroups, including a checksum digit, and you might want to capture that information.
In other cases, you may have a set of valid possible values appearing in a field, such as in a "status" field or a "type" field, and wish to model that as an enumeration. You might even expect one of multiple different types of objects to appear and require an algebraic data type; that's particularly likely in lists of varied elements.
Let's take a look at an example which includes all of these features and represents a virtual shelf of items a user has put on their "to read/watch" list:
{
"shelfId": "10821-00001",
"items": [
{
"type": "book",
"title": "Feast of Souls",
"author": "C. S. Friedman",
"status": "todo"
},
{
"type": "movie",
"title": "Lucky Number Slevin",
"director": "Paul McGuigan",
"status": "done"
}
]
}
and the models:
case class ShelfId(userId: String, shelfId: String)
sealed trait MediaItem {
val title: String
val status: MediaItem.Status
}
object MediaItem {
sealed trait Status {
val value: String
}
object Status {
case object Todo extends Status {
override val value: String = "todo"
}
case object Done extends Status {
override val value: String = "done"
}
}
case class Book(title: String, author: String, status: Status) extends MediaItem
case class Movie(title: String, director: String, status: Status) extends MediaItem
}
case class Shelf(shelfId: ShelfId, items: List[MediaItem])
When working with cases like these, the guiding principle is to keep the model representing the fully-parsed,
clean model, and deal with transforming to and from the simpler JSON representation in the serialisation
code under your XXXJsonProtocol
trait.
Note that we've used an algebraic data type, using sealed traits and case objects, for our enumeration value,
the same way we did for our different types of media item. Since this is simply a set of enumerable values
we could also have used Scala's Enumeration
type or another strategy; there are a few possible ways of
dealing with enumerations in Scala.
Note also that we drop the type
field from items since that information is now provided by the type itself.
Let's take a look at what the serialisation logic might look like for the above example:
import spray.json._
import com.example.blog.models._
import com.example.blog.models.MediaItem._
import com.example.blog.models.MediaItem.Status._
trait ShelfJsonProtocol extends DefaultJsonProtocol {
implicit val statusFmt = new JsonFormat[Status] {
override def write(status: Status): JsValue = JsString(status.value)
override def read(v: JsValue): Status = v.convertTo[String] match {
case Todo.value => Todo
case Done.value => Done
case s => deserializationError(s"Unknown status value '$s'")
}
}
implicit val bookFmt = jsonFormat3(Book)
implicit val movieFmt = jsonFormat3(Movie)
implicit val mediaItemFmt = new JsonFormat[MediaItem] {
override def write(item: MediaItem): JsValue = item match {
case b: Book => b.toJson
case m: Movie => m.toJson
}
override def read(v: JsValue): MediaItem = {
// Note that we allow .fields("type") to throw a NoSuchElementException if absent here since
// spray JSON reports errors as exceptions, but you might handle this more specifically to
// make your error messaging clearer
v.asJsObject.fields("type").convertTo[String] match {
case "book" => v.convertTo[Book]
case "movie" => v.convertTo[Movie]
case t => deserializationError(s"Unknown item type '$t'")
}
}
}
implicit val shelfIdFmt = new JsonFormat[ShelfId] {
override def write(shelfId: ShelfId): JsValue = {
JsString(s"${shelfId.userId}-${shelfId.shelfId}")
}
override def read(v: JsValue): ShelfId = v.convertTo[String].split("-").toList match {
case userId :: shelfId :: Nil => ShelfId(userId, shelfId)
case _ => deserializationError(s"Invalid shelfId format '$v'")
}
}
implicit val shelfFmt = jsonFormat2(Shelf)
}
object ShelfJsonProtocol extends ShelfJsonProtocol
A few things to unpack here:
Book
and Movie
, which are simple enough in
themselves, and then write a format for the overall MediaItem
which will use the type
field to figure
out which format it should be trying to use, and then delegate to it. Since we have a nice clear type
field this is quite easy in this case.Shelf
itself is just a straightforward
jsonFormat2
.I've covered a couple of examples where the JSON data is quite simple to work with, but that's not always the case, and often the readability of your data will be beyond your control. I've run into several examples in the past of quite arcane JSON (or worse, 20-year-old XML) payloads which are difficult to tame.
The key thing to remember is: your internal models don't have to look anything like your JSON.
Here's an example compounding several mistakes I've had to deal with in the past:
{
"MSGDat11": "Sun, Aug 7 2022",
"MsgTim12": "17:11 PM",
"BkDtails13": [
{
"Tit118": "Nineteen Eighty-Four"
},
{
"AutNam121": "George Orwell"
},
{
"ISBN117": "9780155658110"
}
],
"IsFam812": "Yes",
"ForSal813": "No",
"NumPag921": "450"
}
This looks like an extreme example but contains several issues I've had to work around in the past:
Yes / No
instead of booleanI will mention briefly here that spray json gives you some tools to help with the simpler concern of field
names, e.g. jsonFormat("MSGDat11", "MsgTim12", ..., MyModel.apply)
but reshaping the JSON data into a
sensible structure to parse into your model can be hard work for some of these issues, and I'll leave the
details of that as an exercise for the reader, and to be revisited in future.
Regardless, a mistake I've seen a few times is fearing having to write logic into the serialisation layer, such as we've seen in some examples above, and preferring to write models directly equivalent to the JSON. Largely the reasoning here is not wanting to get to grips with how to write custom serialisers, and the argument I've heard for this approach is that minimal serialisation logic makes it easy to see what's being modelled. There are many problems with this approach, though, and with an example as ugly as the one above, several become obvious:
Function22
– that means that with more than 22 fields in a
single case class, or as arguments to a single function call, your library's utilities start breaking
down. It'll also thwart your own efforts to refactor to stay DRY – you can't pass around your own
25-argument functions to help map your field names, either. Worse still, this issue may only become
evident after hours of committing to the approach or when you need to add new fields to your model months
later.These mistakes are usually made because the fear of writing serialisers outweighs the fear of dealing with bad JSON data, but before deciding you want to avoid any serialisation logic at all, keep in mind exactly how bad data can get.
In brief: