Friday 9 July 2021

Gatling and the test data generator pattern

Background

Recently I’ve done a lot of work on performance and load testing a variety of back-end systems which required huge JSON objects containing lots of different fields. In load testing, its useful to really push a system in terms of memory, cache and its disk space, which requires every bit of data we send to be unique.
Fortunately, the developers had already created a POJO project that documented the JSON schema in code, however, this only provided the skeleton of what the JSON should look like, but didn’t physically generate any data.
Initially I wrote a lot of Scala code in my Gatling project to use the POJO to define test data for my requests, however when we have 200+ different fields this quickly became a nightmare to maintain and confusing for newcomers to understand.
This is where I came up with the idea to create a 3rd project to define test data, splitting it out from my Gatling project so that it was only concerned with code that controlled load tests.

 

Data model

Here I have created an example of what a data-model project could look like:
https://github.com/matthewbretten/example-json-data-model

It is basically a code representation of a JSON schema, where we define what a particular object looks like, so in this case I’ve gone with an example of a shopping basket where it has particular nested objects such as “customer” and fields like “name” - also defining what type of data these are (such as String or Integer).
In my context, the developers had already created this and used it within their Java applications, so there wasn’t any work for me to do originally and it was extra useful to share the same effective “contract” in a sense - if the model changed it was only changed in one place and it was simply a case of increasing version numbers.

Test data generator

Here I have created an example of what a test-data-generator project could look like:
https://github.com/matthewbretten/example-json-test-data-generator

This pulls in the above data-model as a dependency and then goes a step further and provides functions that return full JSON strings. It has definitions of what various fields should look like and controls how random this content is.
So for example, in the JSON schema we define objects such as a Name:
“Name”:”String”
Whereas in this project we are now defining and controlling what content we get in the String - we could make it totally random strings “Acse1234fggDG” or we could attempt to make it realistic “Billy Boat”. In this case I’ve provided some examples where we randomly select from lists of reasonably realistic data - this is limiting the “randomness” of the data but keeping it more useful. Depending on what you’re trying to test, you may want to change it so its more random.
The key point here is that we have a neat Java project to easily define and control this behaviour.

Gatling

Here I have created an example Gatling project that uses both of the above projects as dependencies:
https://github.com/matthewbretten/gatling-example-datamodel-pattern

This defines our load tests, controls how many requests per second and where we send the data. However, the data is now fed from the test-data-generator project, the advantage being that all of that code is now kept separate and keeping the amount of Scala code to a minimum. The latter was relevant for me because many of the developers and testers were not familiar with Scala - so keeping the load test code simple and understandable was valuable.

Benefits

  • Each project only has 1 job - particularly the Gatling project is kept strictly to load test design and avoids bloat from having to define every field in a large JSON payload.
  • The test-data-generator can be re-used for other purposes, it can be used in unit tests or pulled in by other tools. 
  • Easily extendable to make data more or less random or more or less realistic.
  • Java is more widely understood and used than Scala, particularly when we include the Gatling DSL.
  • A neat way to also document what “realistic” test data looks like.

Downsides

  • More complicated in terms of more than 1 code project to maintain, may be harder to follow. This is not worthwhile in simpler examples where the JSON schema is very small or where we simply can hard-code many of the fields.
  • Load test design is now spread across multiple projects which makes it more work to understand. I found it was important to then make sure the relationship between the Scenario and Request names in Gatling was tied to the different kinds of test data so make it easier to quickly understand in the Gatling HTML reports.
  • There is also a separation of concern regarding the performance of Gatling itself - we now have to be careful how we design the test data code because itself can be slow to physically generate and return the JSON strings. On this note, I was originally using the java-faker library but found its use of string replacement to be very slow, which slowed down how fast Gatling could generate requests. With separate projects, such a concern may not be as apparent.