Friday, 9 July 2021

Gatling and the test data generator pattern

Background

Recently I’ve done a lot of work on performance and load testing a variety of back-end systems which required huge JSON objects containing lots of different fields. In load testing, its useful to really push a system in terms of memory, cache and its disk space, which requires every bit of data we send to be unique.
Fortunately, the developers had already created a POJO project that documented the JSON schema in code, however, this only provided the skeleton of what the JSON should look like, but didn’t physically generate any data.
Initially I wrote a lot of Scala code in my Gatling project to use the POJO to define test data for my requests, however when we have 200+ different fields this quickly became a nightmare to maintain and confusing for newcomers to understand.
This is where I came up with the idea to create a 3rd project to define test data, splitting it out from my Gatling project so that it was only concerned with code that controlled load tests.

 

Data model

Here I have created an example of what a data-model project could look like:
https://github.com/matthewbretten/example-json-data-model

It is basically a code representation of a JSON schema, where we define what a particular object looks like, so in this case I’ve gone with an example of a shopping basket where it has particular nested objects such as “customer” and fields like “name” - also defining what type of data these are (such as String or Integer).
In my context, the developers had already created this and used it within their Java applications, so there wasn’t any work for me to do originally and it was extra useful to share the same effective “contract” in a sense - if the model changed it was only changed in one place and it was simply a case of increasing version numbers.

Test data generator

Here I have created an example of what a test-data-generator project could look like:
https://github.com/matthewbretten/example-json-test-data-generator

This pulls in the above data-model as a dependency and then goes a step further and provides functions that return full JSON strings. It has definitions of what various fields should look like and controls how random this content is.
So for example, in the JSON schema we define objects such as a Name:
“Name”:”String”
Whereas in this project we are now defining and controlling what content we get in the String - we could make it totally random strings “Acse1234fggDG” or we could attempt to make it realistic “Billy Boat”. In this case I’ve provided some examples where we randomly select from lists of reasonably realistic data - this is limiting the “randomness” of the data but keeping it more useful. Depending on what you’re trying to test, you may want to change it so its more random.
The key point here is that we have a neat Java project to easily define and control this behaviour.

Gatling

Here I have created an example Gatling project that uses both of the above projects as dependencies:
https://github.com/matthewbretten/gatling-example-datamodel-pattern

This defines our load tests, controls how many requests per second and where we send the data. However, the data is now fed from the test-data-generator project, the advantage being that all of that code is now kept separate and keeping the amount of Scala code to a minimum. The latter was relevant for me because many of the developers and testers were not familiar with Scala - so keeping the load test code simple and understandable was valuable.

Benefits

  • Each project only has 1 job - particularly the Gatling project is kept strictly to load test design and avoids bloat from having to define every field in a large JSON payload.
  • The test-data-generator can be re-used for other purposes, it can be used in unit tests or pulled in by other tools. 
  • Easily extendable to make data more or less random or more or less realistic.
  • Java is more widely understood and used than Scala, particularly when we include the Gatling DSL.
  • A neat way to also document what “realistic” test data looks like.

Downsides

  • More complicated in terms of more than 1 code project to maintain, may be harder to follow. This is not worthwhile in simpler examples where the JSON schema is very small or where we simply can hard-code many of the fields.
  • Load test design is now spread across multiple projects which makes it more work to understand. I found it was important to then make sure the relationship between the Scenario and Request names in Gatling was tied to the different kinds of test data so make it easier to quickly understand in the Gatling HTML reports.
  • There is also a separation of concern regarding the performance of Gatling itself - we now have to be careful how we design the test data code because itself can be slow to physically generate and return the JSON strings. On this note, I was originally using the java-faker library but found its use of string replacement to be very slow, which slowed down how fast Gatling could generate requests. With separate projects, such a concern may not be as apparent.






Thursday, 31 December 2020

A tester’s feedback on system feedback

It’s not surprising that as testers our default approach to a black box is to try and perform tests to understand it better. However, sometimes the desire or need to do this is a test in itself! Rather than advocate for more testing, sometimes I think we should advocate for more observability. More observability not only makes the system, bug or problem we’re trying to understand easier, it unlocks and enables lots more testing, it can make our testing faster and it directly helps the operation of the system too - making it easier for people to support in production.

In motor racing, there are roughly 2 main types of racing driver - those that adapt to problems with the car’s handling and those that are really good at identifying those problems and feeding back to engineers and mechanics to simply make the car easier and faster. The really good drivers are good at both.

I feel many testers naturally fall into the first category when it comes to testing, I think for many good reasons we are quite patient and persistent at finding ways to test even when it is difficult, mundane or time-consuming. I think our profession attracts the sort of mindset that loves to investigate, problem-solve and carefully understand problems step-by-step. This makes us really good at understanding systems just by observing behaviour. None of that is bad, these are our strengths and typically what we bring to most teams.

However, sometimes I feel there are times when we could be feeding back and suggesting ways to improve the system we are testing, in ways that make it easier to test. Sometimes we are asked to test software that is very difficult to understand beyond its behaviour, and we can spend a lot of time and effort testing to understand all of this behaviour.

In addition, in order to feedback, make comments and suggest improvements on how the system gives feedback we need a bit of technical understanding and experience of what is possible. However, I believe we can all learn a little bit more and it can all help exponentially to the quality of the software we help produce.

Below are some suggestions on how we can assess the feedback a system gives us as testers and therefore make suggestions to improve it.

Logging

The first most obvious way to improve system feedback is assessing the quality of logging, through logs we can make the system report both errors but also behaviour. If we have some complex logic to send data through and we’re unsure which path it is taking, we can make the system log this “processing data 1”, “data 1 was sent to x because y equalled z”.

I frequently find the quality of logs on many projects to be quite poor for a variety of reasons, they tend to be neglected - its not just testers who lack experience or knowledge of logging, many developers also haven’t explored this much themselves.

There are multiple aspects to this that I could write lengthy blog posts about but to summarise a few areas that you can help assess the quality of:

Where are the logs?

If we have multiple servers or devices each logging do I have to login to each one separately to view the logs - or can we make the access easier by putting in one place (centralised logs).

What logs are these?

There are lots of kinds of logs for different layers of software and context:

  • The operating system logs lots of low level things like user access and system processes but these are quite noisy and don’t tend to be useful for the average developer or tester.
  • Back-end system logs (like a Java process).
  • Front-end logs (like a website).
  • Audit logs (user access, what they accessed and when, whether they were denied access).
  • Business metrics (sometimes we can log when certain actions happened)
  • Is it easy to distinguish between he different types of logs?

UX of logs

Particularly when we send logs to centralised systems such as ELK (Kibana), there is work required to make the logs easy to read, navigate and understand.
For example:

  • Easily and accurately being able to filter for ERROR logs
  • Traceability - can we trace different events in different applications logs together with some common identification such as an ID?
  • Formatting the log line data correctly (typically as JSON) so that it can be displayed correctly in Kibana - e.g. long stack traces with multiple lines appearing as separate logs rather than 1 log.
  • Easily being able to identify and separate different systems and environments - can we quickly distinguish between Prod/Live and test environments?

Errors and stack traces

Do we write error logs when something has gone wrong? If we see a bug as user, try to forget that you can “see it” and know the steps - has the system written an error log that would help us identify it if we didn’t know that?
When we have errors, do we also write out the stack trace that goes with it? Even when the error itself is ambiguous or vague, the stack trace can help developers with clues of what the underlying problem is.

Monitoring and metrics

In addition to logs, we can also observe the behaviours of a system via monitoring and metrics. So rather than just rely on negative feedback like errors from logs, we can also use positive behaviours, such as counting the number of times users have visited pages or whatever the “successful” use of the system means for us. Sometimes when things go wrong, we don’t get errors - but we can still observe that this happened by the lack of drop in positive or business metrics.
Does you system have somewhere you are collecting data on what its doing? This could be things like Google Analytics, where you can track what users are clicking on, or it can even just be logs like above stored in ELK/Kibana - logging each time the system processes a piece of data successfully.

Summary

I try to keep at the forefront of my testing the thought “what if this happened in production, can we tell the system or user did this?”. I believe that by adding this to my assessment of bugs and quality that I’m improving the overall quality of the broader system in terms of operability and supportability. It also can massively help my own testing - I have tested very complex wholly back-end systems which feature no obvious user interfaces and by pushing for better feedback from the system, it has made it way easier to test as well.

Tuesday, 17 December 2019

Using Gatling to dynamically generate lots of complex JSON

Update (9th July 2021) - I have created a better way of doing this using a separate Java project approach - https://bestofthetest.blogspot.com/2021/07/gatling-and-test-data-generator-pattern.html?m=1

This is a quick post sharing some recent work I’ve done to investigate using Gatling to generate large amounts of load, specifically in the form of complex JSON documents. As the JSON documents are complex, with nesting, relationships and logic, I didn’t want to use the usual method of string replacements with Gatling session variables. I wanted to construct the JSON in code so I could make use of helpful code patterns and practices to make it easier to build and maintain.
At the same time, I wanted to make use of Gatling’s scenario functionality because its a useful way of modelling and shaping data in a realistic manner, as well as also giving a lot of code that generates load for free. I also knew already it was possible to have Gatling call and use Scala code as I had done it before.


The code structure and building JSON

You can find the code here:
https://github.com/matthewbretten/gatling-json-generator

The first point of entry for the code is the “TestSimulation.scala” file which defines and executes the Gatling session. I have included a simplistic e-commerce example, where there are two main user stories - your casual shopper who buys 1 or 2 items and big spenders who buy lots of items. In the comments, you can see an example of how I use this to control the load - letting me define a scenario where we have lots of casual shoppers regularly sending data, whereas big spenders are more rare and only occasionally send data.

The key part for this post is the feeders (defined by “.feed”) that pull data defined by a Scala object imported into this test. This is how I bring in JSON objects defined by Scala code into the Gatling session.

If you follow this code, you will see how I’ve written Scala case classes that define the shape of the JSON (under the folder “objects” in my code) and I’ve written Scala objects that define how to generate their respective classes. This gives me a nice separation of maintaining the JSON structure and modifying and maintaining the data set I populate the JSON with.


Relational data in JSON

Sometimes you need to create JSON that has a relationship, such as a variable that sums the numbers of other items or a summed price. If you look at the ItemGenerator code you can see how I’ve been able to dynamically generate a random list of items with random prices but still have a related field “totalPrice” correctly equal the sum of the individual item prices.

Generating random data outside of Gatling’s DSL
In addition to this JSON object generation, I’ve also included some examples of generating random data. Why not use Gatling’s features for this? Well because I want to define the JSON up front, I can’t use Gatling functions without starting a new Gatling session, which you cannot have multiple instances of. So I wrote some of my own code to allow me randomly generate things like loading values from files.
Why did I copy the function “RandomIntBetween” from scala.util.Random? This is because its only available in Scala 2.13 and currently Gatling only works with 2.12.


Debugging Gatling

Also as an aside, I’ve included comments on how to print out Gatling session variables during its run. Gatling can be difficult to debug and sometimes it’s useful to see and tweak its behaviour during the run.

Summary

I hope someone finds this useful, I had fun writing this and learned a lot about Gatling and Scala in the process, and I will for sure be referring back to this code in future. I also found myself refactoring this code a whole lot more in the process of uploading and sharing it!

Thursday, 12 September 2019

Testing in DevOps

I've just spent the past year embedded in a "devops" team (quotation marks explained later) and I've got a few different points to make, so bear with me, this is going to be a long post. Also a bit of a brain dump so it might not be my best writing ever as I want to write this while its relatively fresh in my head
This post is also going to be a little technical and assume some knowledge of DevOps, if you're new to the phrase, I highly recommend Katrina Clokie's book "A Practical Guide to Testing in DevOps" found here - https://leanpub.com/testingindevops

That word "Devops"

In my experience, there are two different understandings of the word/phrase "devops". Basically it boils down to:
  • "Devops" is not a role, it's a set of practices, which makes it a bit woolly and vague but in general is about bringing two traditionally separate roles together so that a "team" can both deliver (for example) a software application and its hardware but also maintain, operate and support it in production/live/whatever you want to call it. This can be achieved by training the team in operations or by embedding ops engineers.
  • "Devops" is where developers write infrastructure-as-code, typically Ops engineers interested in programming. But sometimes also software programmers interested in getting their hands dirty with more Opsy work. In this definition, you tend to see this become a role called "Devops engineer" and they tend to write re-usable chunks of code that builds infrastructure for software teams to use. For example, creating a generic set of code that provides a MySQL cluster in AWS.

I highlight this because I've realised people aren't aware that there is this difference and personally I prefer to encourage the former rather than the latter. The latter is a bit like SDETs/Developers in Test where you're creating a whole new communication/distance from the end goal role, writing tests that are broken by the dev team because they have no idea about them. The first definition I like because its about teamwork and delivery, encouraging the team to take a more holistic view of software delivery or rather product delivery as a whole. After all, who cares if your code is shit hot if we've put it on hardware too small to run it?

Cloud and infrastructure-as-code

Regardless of those definitions, if you're going to work with cloud-based infrastructure (as opposed to on-premise, where your company owns or rents the physical servers), you're going to be writing infrastructure-as-code. Why? Because in the cloud you are sharing your hardware with other companies and people, which means you have less control over how it operates and this changes the risk profile. The cloud is cheaper than owning the physical servers but this comes at the cost of reliability.
Therefore you may want to automate many aspects of operating your product, such as recoverability and scalability which is where infrastructure-as-code comes in.
There are two main areas that can be expressed in code:

  • Code for provisioning the infrastructure you need e.g. the hardware, the network routing, firewall rules and so on.
  • Code for configuring an individual server, e.g. setting up users and file permissions, installing software, configuring software (such as setting up Java and then setting up a Java based application to run).
Why consider these two separate areas? In my opinion there is so much to consider and think about in both that its worthwhile considering them in their own discussions, even though you will want to develop and test them together.

Testing in DevOps

So what to test? How do you test? What tools do you use? Is there anything to really test?
Hell yes there’s loads to test, you’ve now got code that builds the foundations of your product, and not just that, but its code that defines how your product will scale and recover and also determine its reliability. Suddenly a developer can potentially make a small change and open all of your servers to the public to access.
These are some ideas of what you can test in an Ops world:

  • You can manually run and try out the code as an obvious starting point. Does it build the servers correctly? Can you use the application after destroying it and building it again?
  • Destroying the servers leads to OAT (Operational Acceptance Testing) or general operability. Testing what happens in disaster and failure scenarios. Will your product recover if the servers suddenly disappear and new ones are built? Do you test your backups regularly? This also neatly leads to a ideas such as chaos engineering.
  • The code itself can be sort-of unit tested. For example Ansible has a framework called Molecule which allows you to run your Ansible scripts against a Docker container and assert what state the scripts will leave a server in. There are also more broader integration test tools such as Test Kitchen which have slightly more capabilities.
  • Using tools such as AWS Trusted Advisor or Well-Architected to analyse your infrastructure for common mistakes (such as setting up firewall rules completely open to the public) or under-utilised hardware that could be run more cheaply.
  • Given cloud infrastructure is inherently prone to failure, can you monitor and alert those failures? Do you know if your servers fell over overnight? How many errors are happening in your environments? Usually cloud providers don’t have access to your servers to know what is going on, so you need to setup your own access to software logs (e.g. like Java app errors), have you centralised these logs for easy access?
  • Tools like Sensu allow you write custom automated checks to monitor your servers, this is very useful for more granular checks like specific software health checks (e.g. your server never failed but the software application has crashed, can you tell from your monitoring?). I think there is a lot of value here for testers to help design, write and create new simple but smart checks and improve how observable systems are not just in production but also in all environments!


Some other things I’ve worked on that were very context specific, and not very DevOpsy or Testing, but could be useful ideas:

  • Creating Jenkins pipelines to test out rebuilds of infrastructure-as-code on a daily basis - this was to cover a specific risk we had that our code was very tightly coupled and we were breaking a lot of projects with our changes. This was an interim solution until we un-coupled our architecture but it was useful to be able to do it.
  • Off the back of Jenkins pipelines for rebuilding infrastructure-as-code, I also created jobs that would manage downtime periods to help save a good two-thirds of our monthly costs. In general we only needed our test environments to be up 8 hours a day, 5 days a week, not 24-7. I used Jenkins so I could manage dependencies and run custom checks but this could also be achieved with the right auto-scaling policies in AWS too.
  • Fixing and writing my own Sensu checks, this was useful although I would be careful as its easy to write a check that produces false positives or negatives. Its very hard to think of all scenarios the check could encounter so avoid writing scenario-based checks where possible. It’s not helpful to have monitoring checks that have bugs themselves and are difficult to debug when they fail, keep them simple.
  • Hooking together Dev team Selenium tests, this was because my context was an Ops team changing infrastructure for Dev teams. I wanted a way to test our changes before we potentially broke the Dev teams’ dev environments. This isn’t recommended if you can avoid it as obviously its not very DevOps. But in general finding a way for infrastructure-as-code to be eventually end-to-end tested in an automated way is useful because its hard to really test things like firewall and network routing configuration or file permissions until you actually try to perform certain actions from the server. The hard part is knowing when to run the tests, you to know when your infrastructure-as-code is finished and when the servers and various components have actually completed their setup and app is running. I achieved this with a Jenkins pipeline which polled a Sensu check that might look at a health endpoint from the app, when this went green I knew to proceed with the test and it would timeout if it took longer than usual.
  • Writing simple scripts for monitoring or analysing our AWS account. In our context we needed to tag the hardware we were using for our own internal billing purposes so we could appropriately budget for certain projects. As this relied on humans remembering to include the tagging in their infrastructure-as-code, it was useful to regularly audit the account for servers that were missing tags and therefore wouldn’t be billed appropriately. This also made it easier to investigate under-utilised servers and talk to the owners about saving costs.

Monday, 26 August 2019

Using Bowtie Diagrams to describe test strategy

Introduction

A long time ago in this blog post I was introduced to the Bowtie Diagram. I love how this visualises how we manage risks and I feel this compliments test strategy. Why? Well surely our test strategies should be accounting for risk and how we manage it. Whether you define testing in a specific focused sense (like functionally testing code) or in a holistic or broader sense (like viewing code reviews as testing, or monitoring or simply asking the question “What do end users want?”) - these activities are ways of either preventing or mitigating risks. I feel if you want to be effective in helping improve the quality of a project or product, you need to assess the potential risks and how you are going to prevent or mitigate them and therefore also assess where your time is best placed.
What are Bowtie Diagrams?
The short version - it’s a diagram that looks like a bowtie, where you describe a hazard, the top likely harmful event then list threats that will trigger the event and consequences from the event happening. For each threat and consequence, you describe a prevention or mitigation.



The long version, (and better more comprehensively described version) can be read here - https://www.cgerisk.com/knowledgebase/The_bowtie_method

What I love about these diagrams is that they more visually describe and explain risks and how we intend to manage them. Creating them is a useful exercise in exploring risks we may not have normally thought of, but in particular, I find we don’t explore the right hand side (consequences) of these diagrams very often. I find most of the time in software development that we are very reactionary to consequences, and even then, we don’t not typically spend much time on improving mitigation. 


Managing the threats
I’ve started using these diagrams to explain my recent approaches to test strategy because they neatly highlight why I’m not focusing all of my attention on automated test scripts or large manual regression test plans. I view automated test scripts as a barrier to the threat of code changes. Perhaps the majority of people out there view this as the biggest threat to quality, perhaps many people understand the word “bug” to mean something the computer has done wrong. These automated test scripts or regression test plans may well catch most of these. But are these the only threats?

I see other threats to quality and I feel we have a role to play in “testing” for these and helping prevent or mitigate them.

Managing the consequences
Any good tester knows you cannot think of or test for everything. There are holes in our test coverage, we constantly make decisions to focus our testing to make the most of precious time. We knowingly or unknowingly choose to not test some scenarios and therefore make micro judgements on risk all of the time. There are also limits to our knowledge and limits to the environments in which we test, sometimes we don’t have control over all of the possible variables. So what happens when an incident happens and we start facing the consequences? Have we thought about how we prevent, mitigate or recover from those consequences? Do we have visibility of these events occurring or signs they may be about to occur?
I find this particular area is rarely talked about in such terms. Perhaps there is some notion of monitoring and alerting. Usually there are some disaster recovery plans. But are testers actively involved? Are we actively improving and re-assessing these? I typically find most projects do not consider this as part of their strategy, in most cases it seems to be an after-thought. I think most of this stems from these areas typically being an area of responsibility for Ops Engineers, SysAdmins, DBAs and the like, whereas as testers we have typically focused on software and application teams. As the concept of DevOps becomes ever popular, we can now start to get more involved in the operational aspects of our products, which I think can relieve a lot of pressure for us to prevent problems even occurring.


Mapping the diagram to testing
An example of using a diagram like this within software development and testing:







I feel our preventative measures to an incident occurring are typically pretty good from a strategic view, especially lately where it has become more and more accepted that embedding testers within development teams improves their effectiveness. Yes, maybe we aren’t writing unit tests or involving testers in the right ways sometimes still. But overall, even with such issues, efforts are being made to improve how we deliver software to production.

But on the right-hand side, we generally suffer in organisations where DevOps has not been adopted. And when I say DevOps, I don’t mean devs the write infrastructure-as-code, I mean teams who are responsible and capable of both delivering software solutions and operating and maintaining them. Usually, we see the Ops side of things separated into its own silo still and very little awareness or involvement from a software development team of their activities. But Ops plays a very key role in the above diagram because they tend to be responsible for implementing and improving the barriers or mitigations that help reduce the impact of an incident.

I feel the diagram neatly brings this element into focus and helps contribute to the wider DevOps movement towards a holistic view of software development, towards including aspects such as maintenance, operability, network security, architecture performance and resilience as qualities of the product too.

As testers I feel we can help advocate for this by:

  • Asking questions such as “if this goes wrong in production, how will we know?”
  • Requesting access to production monitoring and regularly checking for and reporting bugs in production.
  • Encouraging teams to use a TV monitor to show the current usage of production and graphs of performance and errors.
  • If you have programming/technical skills, helping the team add new monitoring checks, not dissimilar to automation checks. (e.g. Sensu checks)
  • Becoming involved with and performing OAT (Operational Acceptance Testing) where you test what happens to the product in both expected downtime (such as deploying new versions) and disaster scenarios, including testing the guides and checklists for recovery.
  • Advocating for Chaos Engineering.

Friday, 17 May 2019

Using character recognition for browser automation

Introduction

I was venting recently to my colleague Chris Johnson about the frustration of working with yet another horrendous website that didn’t have any usable locators and he put to me the idea of “what if we had visual driven automation, where the automation treats the website as a complete black box, like a human user?”. It got me thinking, surely this could be possible already in some respect. As I am already very familiar with AWS (Amazon Web Services), I had a look if they had any services that could be useful for this and found their brand new Textract service, which lets you send image documents (.jpg, .png or .pdf) and analyses them for text. This then gave me the idea to try and use this service with Selenium to find elements on a page and click them without any need to interact or understand the HTML. This post is a report on my findings.

What is Textract and how does it work?

Textract is a type of OCR (Optical Character Recognition) service that detects text and data in image documents. It works by supplying it an image file and it responds with the results of its analysis, a list of words, sentences and objects (like forms and tables) that it has identified. Each word or sentence has a set of data including the location in terms a rectangular box and a percentage confidence of the accuracy of its findings. Behind the scenes it manages this via a pre-trained machine learning algorithm.





 

As with other AWS services, you can access Textract via the AWS CLI so for my research I used the boto3 library for Python3 which allows you to interact with the AWS CLI within Python very easily.

When you are using Textract, you receive JSON responses that look like this:

{
        "Blocks": [
            {
                "Geometry": {
                    "BoundingBox": {"Width": 1.0,"Top": 0.0,"Left": 0.0,"Height": 1.0},
                    "Polygon": [
                        {"Y": 0.0, "X": 0.0},
                        {"Y": 0.0, "X": 1.0},
                        {"Y": 1.0, "X": 1.0},
                        {"Y": 1.0,"X": 0.0}
                    ]
                },
            "Text": "Store",
            "Confidence": "99.1231233333",
            "BlockType": "WORD"
        }
        ]

}

My test

So the idea for my test was to create a simple script that would:
  1.     Use Selenium to load a website.
  2.     Take a screenshot.
  3.     Send the screenshot to Textract.
  4.     Find the location of some text on the website that would take us to a new page.
  5.     Click the location on the website.
  6.     Check that we had gone to the correct new page.

Pre-requisites

In order to get this test working, this is what I needed:
  • A computer with Python3 installed and the relevant libraries (boto3, Selenium).
  • Chromedriver and Chrome installed.
  • An AWS account with access to Textract. At the time of writing Textract is in preview and you need to ask Amazon nicely for access. It took about a week for me to get access.
  • An AWS IAM (Identity and Access Management) user on my AWS account that has the permissions to interact with Textract.
  • AWS CLI installed on my machine and configured with the IAM user and an AWS region that Textract is available in (its currently only available in a few regions).
I chose to use Python because I’m very familiar with it, and it allows me to write a simple script like this very rapidly with little setup needed. You can use any programming language, you just need to be able to interact with the AWS CLI.

My findings

Did it work? Yes, effectively. I was able to create a script that did all of the above and successfully navigate between pages...but with a very big caveat - only if the text appeared in the top left corner of the page in any resolution. Why? When you request a screenshot from web browsers, they do not take a screenshot of what you can see, but instead they take a full page screenshot. This means you cannot easily map the pixel locations from the screenshots to the browser window, particularly when the real browser window is a smaller resolution and the website dynamically changes visible location of elements. Even at larger resolutions there is a small difference in resolution sizes because you almost never display the full page in a window.

This means that my idea has limited usefulness until I could find a better way of generating screenshots that are just the visible area rendered by the browser.

However, in principle my idea did work if you can provide an image that maps 1:1 with the browser window, but there are still other issues. I tried testing against different websites just to see what other issues would crop up:

  • Naturally you need relatively clear text in order for Textract to confidently find it. However, this could be argued that unclear text would be an accessibility issue anyway and in the below screenshot you can see Textract does a reasonable job of even reading unusual fonts or text that isn’t perfectly straight.
  • Obviously if you want to find elements that are not text, Textract isn’t useful at all. Potentially other pattern recognition services could be useful though if you could provide it an image pattern to find rather than just a text string.
  • It could be tricky to figure out which is the right element if there is more than one example of some text on the page.
  • While Textract is quite fast (roughly 2 seconds or so for it to return a response), some dynamic elements of the page could change in that time. Some website have scrolling elements or pop-ups that only appear after a short time. This means you might not be able to totally run a black box test as you need to check for these elements before proceeding.
Due to these issues, I don’t think you could totally remove the need for HTML locators yet. But there is certainly some potential here to expand the abilities of automation tests and maybe make parts of them more powerful. Even if you don’t use this tech to replace locators for Selenium, there are other applications that might not have been easy before - such as testing PDF files generated by the systems you test.

Other considerations

If you were to consider using this tech in your automation, there are some other factors to consider:
  • Costs, AWS charge for the use of Textract. Depending on your context and how you use it, this may be a limiting factor. The pricing can be found here: https://aws.amazon.com/textract/pricing/
  • Data privacy, AWS state they may store and use the documents you upload to maintain and improve the machine learning algorithm behind Textract. You may work in a workplace where the screenshots contain sensitive information that AWS employees are not authorised to look at. You can request AWS delete documents you upload, but this is a manual process of contacting their support team.
  • While I did find Textract responded quite quickly (around 2 seconds), I don’t know its capabilities under significant load nor did I hit the API request limits. I was only running very low numbers of requests.
  • Textract is currently only available in certain AWS regions, so if you require data be held in particular geographic locations then this is may be a limiting factor.
Of course, other services similar to AWS Textract may be available which overcome these factors.

Summary

This was a fun little exploration into technology I’ve not used before and it was nice to find it so easy to use. I think there is some potential in using this service for automation tooling, particularly if you need to analyse image documents.

If you’re looking to totally remove the need for HTML locators, I think there is more work required though, if you know of a good solution to the browser screenshot resolution problem let me know!

My code

Here’s my code if you’d like to try it yourself:

https://github.com/Ardius/python-selenium-textract/blob/master/textract_test.py

Wednesday, 1 November 2017

So you can test an API, what to learn next?

Introduction

Last week I ran my first ever workshop at a conference for TestBash Manchester! It was an awesome experience, totally different to the talks I’ve done before at meetups and smaller conferences. The workshop was all about how to get started with web API testing and I targeted it at beginners who had no prior knowledge of APIs. For this post though, I’m not going to talk about the workshop, but more about what happens next. Several people have asked me about a more advanced workshop and what they could study in their own time next. I don’t have a quick or easy answer to this as I feel there is lots more to learn and it really depends on your context. However, I’m going to try and discuss some areas and ideas.

Where are we starting from?

Before I get started, I’d like to clarify where my workshop left people and what this post is assuming you’re already familiar with.
  • What APIs are.
  • What do APIs look like and how they work.
  • Why they are useful to understand for testing.
  • What API documentation looks like.
  • What paths and query strings are.
  • How methods work.
  • What status codes mean.
  • The concepts of authentication and authorisation.
  • How to create requests with authentication using Basic Auth.
  • The concept of resources and IDs.
  • What headers are.
  • What request and response bodies are.
  • An introduction to JSON & XML data formats and data types.
  • Using Postman’s basic features.
  • Understanding how Postman collections could be used.
  • Awareness of how basic automation can be created using Postman’s runner.
If you feel lacking in these areas, I would still spend more time on understanding the basics of these before starting on anything more advanced.


So with that, I’ll go over some areas that we could explore for a more advanced workshop or for you to explore in your own time.

Try testing other, more complex APIs

On my workshop, we learned to interact with a simple API that myself and my friend Lee Goodman built together. This API was intentionally designed in a way that allowed attendees to learn in stages, introducing different concepts at each stage. However, this is not how a API will look in reality, you won’t have nice ways to learn it in stages, they will immediately throw everything at you. You typically won’t be able to interact with most APIs without authentication (which will be more complex and won’t be the same for every API) and they will provide varying levels of quality of documentation.


One of the aspects of APIs that I didn’t cover in my workshop is that they are an abstract representation of the resources, objects, functions and capabilities of an application or system. In plain English this means when you learn to use an application via an API, the picture you build up in your head based on the API’s structure and responses is based on a translation of the system underneath the API. Just as with translating from Japanese to English, there are concepts that are not easy or even possible to express in an API. People also make mistakes in translation or have many different ways of creating the translation. It’s useful to get some experience of this by using more APIs, you may start to notice these differences and get a feel for what works well, what doesn’t and perhaps get some sense of the compromises made. You will also see where some of the language that even I use is not consistent across all APIs.


You can try out various public APIs for free, one example being twitter’s API. You can find documentation for lots of public APIs you can try here:
There is also simpler and neater API to to play with produced by Swagger here:

Learn about more complex forms of authentication

In the workshop I only covered the basics of how to authenticate your requests in Postman with one of simplest forms of authentication called Basic Auth. There are many many more types and technologies for authentication that you could learn about, some of them are very complicated and deserve an entire workshop in themselves! I don’t feel it’s necessary to understand them all because you probably won’t come across many of them. But it could be useful to understand the more popular (and secure) kinds of authentication such as OAuth 1.0 and OAuth 2.0.

Learn about different kinds of headers

I briefly talked about headers in the workshop, mainly in reference to the “Authorization” header (which is the one Postman creates for you when you add authentication) and the “Content-type” header which we used in the workshop to tell the server whether we were sending JSON or XML with our POST and PUT requests (again automatically generated by Postman). There are a couple of more headers that you can experiment with when sending requests such as the “Accept” header which can tell the server to return responses in a different format, in the same was the Content-type header. This means you can do weird stuff like send JSON data but demand the response is in XML. I have accidentally killed servers in the past with typos in my headers too! You can read more about different kinds of HTTP headers here.

Learn to use the more advanced features of Postman

Postman has a lot of neat features which can be used to augment your testing in different ways. Learning to use collections can allow you to create documentation of an API you’re exploring which you can share with other testers and developers (particularly useful when a new person starts on a project, you can give them a collection to get them up and running much faster). If you’re finding yourself repeating the same requests a lot, especially to create data, using Postman’s collection runner can allow you to create automated scripts of requests that quickly generate test data for you.


You can further extend the capabilities of your collections to automation checks which can be rapidly run and tell you if the API you are testing is ready for deeper exploratory testing or if there is something significantly wrong. You can do this by using Postman’s test scripts function. While these tests are written in Javascript, it’s possible to write these scripts with little knowledge of Javascript using the example snippets. However, it may be helpful to learn a little bit about Javascript to get the most out of these scripts. You can learn Javascript via sites like this one, which is a free 30 day coding challenge.


Combined with Postman’s pre-request script functionality, you can then create more complex collections using functions such as loops and branching. In addition to these, you can also learn about environments and variables, which let you parameterise data that needs to change every time you run a request. The most common examples of using environments is where you have multiple test environments with different domain names (e.g. wwww.live.test.com & www.stage.test.com) but you don’t want to keep re-writing the requests.


These features can allow you to chain requests together, so rather than manually copying an ID from one request to use in another, Postman can run the two requests together for you. This is a good blog post explaining how to do this.

Try integrating your Postman collections as part of a CI pipeline

One of the most popular topics in software development currently is DevOps and the related topics of CI (Continuous Integration) and CD (Continuous Delivery). Typically a team that works to these methodologies has a ‘deployment pipeline’ where they build their code and run unit and integration tests. If you work with a team like this, you can setup their deployment pipeline to run your Postman collections for you. This means that your collections that help you create test data or check the environment is ready for deeper exploratory testing can be run for you every time you create a new build of the codebase. The tool that allows you to do this is called Newman. Newman simply allows Postman collections to be run on command line, however this means any CI build tool can run it too such as Jenkins, Bamboo, TeamCity or GoCD. Here is a blog about how to do that with Jenkins.

Have a look at any existing APIs you may work with already

This seems obvious, but ask around about any APIs you might already work with, there may be systems you didn’t realise existed that you could have a look at. Or there could be third party integrations that your team use within your application. There may be some existing documentation or even monitoring, it can be especially interesting to have a look at any monitoring you already have. Tools such as AppDynamics gather a lot of data such as API requests, their speed and responses. This can give you an insight into how people use those APIs and what problems may already be occurring.

Try out other tools

Postman isn’t the only way to interact with APIs, it’s a great tool and I especially like using it to teach with because it’s popular and has a nicer interface which isn’t as cluttered as some others. Your team may use a totally different tool or you may need to use another tool in future and they all have different strengths and weaknesses. So it may be useful to learn some other tools such as:
Another popular tool for interacting with APIs is Jmeter, however I highlight this separately as it’s actually a load testing tool. It can be used in a similar fashion to the Postman collections but is designed for running many, many concurrent requests and designing performance test runs. However, I have seen it successfully used as an automated functional checking tool and can be integrated into build pipelines too.

Write automated checks in a scripting or programming language

There will come a point where creating automated checks using Postman becomes very complicated or unwieldy. This is a point where it's typically easier to write it in a scripting or programming language. Why? Postman (and also Jmeter) are GUI-based and so they enforce a particular pattern and design to your tests in order to work. Sometimes what you are trying to do doesn’t neatly fit their structure or sometimes you want to integrate with more systems or perform functions they don’t provide.


Which programming language? It pretty much doesn’t matter, almost every popular programming language comes with libraries for making HTTP requests and frameworks for running tests. What you decide to learn should be guided by:
  • Your level of experience with programming.
  • Who is going to write and maintain these checks.
  • What languages people use around your workplace.
  • What languages are used for the application you are testing.
If you are working with a Java application and your team is happy to share the work and help out, it may make the most sense to write your automated checks in Java using frameworks like Junit and libraries like RestAssured. This means they can be easily incorporated into the rest of the integration tests your team already has and removes the need to find more tools to run it in your pipeline.
However, you may not be closely working with a team like this or have developers to support you. You may have decided that it will be best for you to maintain the tests. In this case it’s more important to choose a language you are comfortable learning. In this scenario I personally like teaching people about Python and the requests library because Python can be easier for newbies to learn programming. However, there are lots of other languages such as Ruby, Javascript, C# and more and none of them are bad to learn. They all have the capability to create these checks and much of what you learn will be transferrable to other languages.

Learn how to work with mocks

Sometimes you may need to test an application that relies upon a third party API. Maybe it’s a website that hasn’t got a back-end finished yet. Perhaps you are working with lots of other development teams and some of the work has been finished early. In these situations it’s useful to be able to create pretend versions of these APIs so you can test as if they were there. These are referred to as ‘mocks’. You can have a play with websites such as this one to create a fake API that responds exactly the same as an application you want to fake. You can then point an application you are testing at it to begin testing against your contract.

Learn about contract testing

Speaking of contracts and mocks, there are now tools that let you create these mocks in a more reliable and automated way. Tools such as Pact allow teams to run automated checks against each other’s services without having to understand how to run the services. While not strictly about APIs, learning about how you could automatedly check an API you provide for another team or vice versa can be a lot more useful than creating massive Postman collections or custom mocks that fall out of date as teams update the behaviour of their services.

This video gives a helpful explanation in a conversational style about Pact and contract testing (thanks Conny!):