Best of the Test: Scala

Friday, 9 July 2021

Gatling and the test data generator pattern

Background

Recently I’ve done a lot of work on performance and load testing a variety of back-end systems which required huge JSON objects containing lots of different fields. In load testing, its useful to really push a system in terms of memory, cache and its disk space, which requires every bit of data we send to be unique.
Fortunately, the developers had already created a POJO project that documented the JSON schema in code, however, this only provided the skeleton of what the JSON should look like, but didn’t physically generate any data.
Initially I wrote a lot of Scala code in my Gatling project to use the POJO to define test data for my requests, however when we have 200+ different fields this quickly became a nightmare to maintain and confusing for newcomers to understand.
This is where I came up with the idea to create a 3rd project to define test data, splitting it out from my Gatling project so that it was only concerned with code that controlled load tests.

Data model

Here I have created an example of what a data-model project could look like:
https://github.com/matthewbretten/example-json-data-model

It is basically a code representation of a JSON schema, where we define what a particular object looks like, so in this case I’ve gone with an example of a shopping basket where it has particular nested objects such as “customer” and fields like “name” - also defining what type of data these are (such as String or Integer).
In my context, the developers had already created this and used it within their Java applications, so there wasn’t any work for me to do originally and it was extra useful to share the same effective “contract” in a sense - if the model changed it was only changed in one place and it was simply a case of increasing version numbers.

Test data generator

Here I have created an example of what a test-data-generator project could look like:
https://github.com/matthewbretten/example-json-test-data-generator

This pulls in the above data-model as a dependency and then goes a step further and provides functions that return full JSON strings. It has definitions of what various fields should look like and controls how random this content is.
So for example, in the JSON schema we define objects such as a Name:
“Name”:”String”
Whereas in this project we are now defining and controlling what content we get in the String - we could make it totally random strings “Acse1234fggDG” or we could attempt to make it realistic “Billy Boat”. In this case I’ve provided some examples where we randomly select from lists of reasonably realistic data - this is limiting the “randomness” of the data but keeping it more useful. Depending on what you’re trying to test, you may want to change it so its more random.
The key point here is that we have a neat Java project to easily define and control this behaviour.

Gatling

Here I have created an example Gatling project that uses both of the above projects as dependencies:
https://github.com/matthewbretten/gatling-example-datamodel-pattern

This defines our load tests, controls how many requests per second and where we send the data. However, the data is now fed from the test-data-generator project, the advantage being that all of that code is now kept separate and keeping the amount of Scala code to a minimum. The latter was relevant for me because many of the developers and testers were not familiar with Scala - so keeping the load test code simple and understandable was valuable.

Benefits

Each project only has 1 job - particularly the Gatling project is kept strictly to load test design and avoids bloat from having to define every field in a large JSON payload.
The test-data-generator can be re-used for other purposes, it can be used in unit tests or pulled in by other tools.
Easily extendable to make data more or less random or more or less realistic.
Java is more widely understood and used than Scala, particularly when we include the Gatling DSL.
A neat way to also document what “realistic” test data looks like.

Downsides

More complicated in terms of more than 1 code project to maintain, may be harder to follow. This is not worthwhile in simpler examples where the JSON schema is very small or where we simply can hard-code many of the fields.
Load test design is now spread across multiple projects which makes it more work to understand. I found it was important to then make sure the relationship between the Scenario and Request names in Gatling was tied to the different kinds of test data so make it easier to quickly understand in the Gatling HTML reports.
There is also a separation of concern regarding the performance of Gatling itself - we now have to be careful how we design the test data code because itself can be slow to physically generate and return the JSON strings. On this note, I was originally using the java-faker library but found its use of string replacement to be very slow, which slowed down how fast Gatling could generate requests. With separate projects, such a concern may not be as apparent.

Tuesday, 17 December 2019

Using Gatling to dynamically generate lots of complex JSON

Update (9th July 2021) - I have created a better way of doing this using a separate Java project approach - https://bestofthetest.blogspot.com/2021/07/gatling-and-test-data-generator-pattern.html?m=1

This is a quick post sharing some recent work I’ve done to investigate using Gatling to generate large amounts of load, specifically in the form of complex JSON documents. As the JSON documents are complex, with nesting, relationships and logic, I didn’t want to use the usual method of string replacements with Gatling session variables. I wanted to construct the JSON in code so I could make use of helpful code patterns and practices to make it easier to build and maintain.
At the same time, I wanted to make use of Gatling’s scenario functionality because its a useful way of modelling and shaping data in a realistic manner, as well as also giving a lot of code that generates load for free. I also knew already it was possible to have Gatling call and use Scala code as I had done it before.

The code structure and building JSON

You can find the code here:
https://github.com/matthewbretten/gatling-json-generator

The first point of entry for the code is the “TestSimulation.scala” file which defines and executes the Gatling session. I have included a simplistic e-commerce example, where there are two main user stories - your casual shopper who buys 1 or 2 items and big spenders who buy lots of items. In the comments, you can see an example of how I use this to control the load - letting me define a scenario where we have lots of casual shoppers regularly sending data, whereas big spenders are more rare and only occasionally send data.

The key part for this post is the feeders (defined by “.feed”) that pull data defined by a Scala object imported into this test. This is how I bring in JSON objects defined by Scala code into the Gatling session.

If you follow this code, you will see how I’ve written Scala case classes that define the shape of the JSON (under the folder “objects” in my code) and I’ve written Scala objects that define how to generate their respective classes. This gives me a nice separation of maintaining the JSON structure and modifying and maintaining the data set I populate the JSON with.

Relational data in JSON

Sometimes you need to create JSON that has a relationship, such as a variable that sums the numbers of other items or a summed price. If you look at the ItemGenerator code you can see how I’ve been able to dynamically generate a random list of items with random prices but still have a related field “totalPrice” correctly equal the sum of the individual item prices.

Generating random data outside of Gatling’s DSL
In addition to this JSON object generation, I’ve also included some examples of generating random data. Why not use Gatling’s features for this? Well because I want to define the JSON up front, I can’t use Gatling functions without starting a new Gatling session, which you cannot have multiple instances of. So I wrote some of my own code to allow me randomly generate things like loading values from files.
Why did I copy the function “RandomIntBetween” from scala.util.Random? This is because its only available in Scala 2.13 and currently Gatling only works with 2.12.

Debugging Gatling

Also as an aside, I’ve included comments on how to print out Gatling session variables during its run. Gatling can be difficult to debug and sometimes it’s useful to see and tweak its behaviour during the run.

Summary

I hope someone finds this useful, I had fun writing this and learned a lot about Gatling and Scala in the process, and I will for sure be referring back to this code in future. I also found myself refactoring this code a whole lot more in the process of uploading and sharing it!