Thursday, 12 September 2019

Testing in DevOps

I've just spent the past year embedded in a "devops" team (quotation marks explained later) and I've got a few different points to make, so bear with me, this is going to be a long post. Also a bit of a brain dump so it might not be my best writing ever as I want to write this while its relatively fresh in my head
This post is also going to be a little technical and assume some knowledge of DevOps, if you're new to the phrase, I highly recommend Katrina Clokie's book "A Practical Guide to Testing in DevOps" found here - https://leanpub.com/testingindevops

That word "Devops"

In my experience, there are two different understandings of the word/phrase "devops". Basically it boils down to:
  • "Devops" is not a role, it's a set of practices, which makes it a bit woolly and vague but in general is about bringing two traditionally separate roles together so that a "team" can both deliver (for example) a software application and its hardware but also maintain, operate and support it in production/live/whatever you want to call it. This can be achieved by training the team in operations or by embedding ops engineers.
  • "Devops" is where developers write infrastructure-as-code, typically Ops engineers interested in programming. But sometimes also software programmers interested in getting their hands dirty with more Opsy work. In this definition, you tend to see this become a role called "Devops engineer" and they tend to write re-usable chunks of code that builds infrastructure for software teams to use. For example, creating a generic set of code that provides a MySQL cluster in AWS.

I highlight this because I've realised people aren't aware that there is this difference and personally I prefer to encourage the former rather than the latter. The latter is a bit like SDETs/Developers in Test where you're creating a whole new communication/distance from the end goal role, writing tests that are broken by the dev team because they have no idea about them. The first definition I like because its about teamwork and delivery, encouraging the team to take a more holistic view of software delivery or rather product delivery as a whole. After all, who cares if your code is shit hot if we've put it on hardware too small to run it?

Cloud and infrastructure-as-code

Regardless of those definitions, if you're going to work with cloud-based infrastructure (as opposed to on-premise, where your company owns or rents the physical servers), you're going to be writing infrastructure-as-code. Why? Because in the cloud you are sharing your hardware with other companies and people, which means you have less control over how it operates and this changes the risk profile. The cloud is cheaper than owning the physical servers but this comes at the cost of reliability.
Therefore you may want to automate many aspects of operating your product, such as recoverability and scalability which is where infrastructure-as-code comes in.
There are two main areas that can be expressed in code:

  • Code for provisioning the infrastructure you need e.g. the hardware, the network routing, firewall rules and so on.
  • Code for configuring an individual server, e.g. setting up users and file permissions, installing software, configuring software (such as setting up Java and then setting up a Java based application to run).
Why consider these two separate areas? In my opinion there is so much to consider and think about in both that its worthwhile considering them in their own discussions, even though you will want to develop and test them together.

Testing in DevOps

So what to test? How do you test? What tools do you use? Is there anything to really test?
Hell yes there’s loads to test, you’ve now got code that builds the foundations of your product, and not just that, but its code that defines how your product will scale and recover and also determine its reliability. Suddenly a developer can potentially make a small change and open all of your servers to the public to access.
These are some ideas of what you can test in an Ops world:

  • You can manually run and try out the code as an obvious starting point. Does it build the servers correctly? Can you use the application after destroying it and building it again?
  • Destroying the servers leads to OAT (Operational Acceptance Testing) or general operability. Testing what happens in disaster and failure scenarios. Will your product recover if the servers suddenly disappear and new ones are built? Do you test your backups regularly? This also neatly leads to a ideas such as chaos engineering.
  • The code itself can be sort-of unit tested. For example Ansible has a framework called Molecule which allows you to run your Ansible scripts against a Docker container and assert what state the scripts will leave a server in. There are also more broader integration test tools such as Test Kitchen which have slightly more capabilities.
  • Using tools such as AWS Trusted Advisor or Well-Architected to analyse your infrastructure for common mistakes (such as setting up firewall rules completely open to the public) or under-utilised hardware that could be run more cheaply.
  • Given cloud infrastructure is inherently prone to failure, can you monitor and alert those failures? Do you know if your servers fell over overnight? How many errors are happening in your environments? Usually cloud providers don’t have access to your servers to know what is going on, so you need to setup your own access to software logs (e.g. like Java app errors), have you centralised these logs for easy access?
  • Tools like Sensu allow you write custom automated checks to monitor your servers, this is very useful for more granular checks like specific software health checks (e.g. your server never failed but the software application has crashed, can you tell from your monitoring?). I think there is a lot of value here for testers to help design, write and create new simple but smart checks and improve how observable systems are not just in production but also in all environments!


Some other things I’ve worked on that were very context specific, and not very DevOpsy or Testing, but could be useful ideas:

  • Creating Jenkins pipelines to test out rebuilds of infrastructure-as-code on a daily basis - this was to cover a specific risk we had that our code was very tightly coupled and we were breaking a lot of projects with our changes. This was an interim solution until we un-coupled our architecture but it was useful to be able to do it.
  • Off the back of Jenkins pipelines for rebuilding infrastructure-as-code, I also created jobs that would manage downtime periods to help save a good two-thirds of our monthly costs. In general we only needed our test environments to be up 8 hours a day, 5 days a week, not 24-7. I used Jenkins so I could manage dependencies and run custom checks but this could also be achieved with the right auto-scaling policies in AWS too.
  • Fixing and writing my own Sensu checks, this was useful although I would be careful as its easy to write a check that produces false positives or negatives. Its very hard to think of all scenarios the check could encounter so avoid writing scenario-based checks where possible. It’s not helpful to have monitoring checks that have bugs themselves and are difficult to debug when they fail, keep them simple.
  • Hooking together Dev team Selenium tests, this was because my context was an Ops team changing infrastructure for Dev teams. I wanted a way to test our changes before we potentially broke the Dev teams’ dev environments. This isn’t recommended if you can avoid it as obviously its not very DevOps. But in general finding a way for infrastructure-as-code to be eventually end-to-end tested in an automated way is useful because its hard to really test things like firewall and network routing configuration or file permissions until you actually try to perform certain actions from the server. The hard part is knowing when to run the tests, you to know when your infrastructure-as-code is finished and when the servers and various components have actually completed their setup and app is running. I achieved this with a Jenkins pipeline which polled a Sensu check that might look at a health endpoint from the app, when this went green I knew to proceed with the test and it would timeout if it took longer than usual.
  • Writing simple scripts for monitoring or analysing our AWS account. In our context we needed to tag the hardware we were using for our own internal billing purposes so we could appropriately budget for certain projects. As this relied on humans remembering to include the tagging in their infrastructure-as-code, it was useful to regularly audit the account for servers that were missing tags and therefore wouldn’t be billed appropriately. This also made it easier to investigate under-utilised servers and talk to the owners about saving costs.

Monday, 26 August 2019

Using Bowtie Diagrams to describe test strategy

Introduction

A long time ago in this blog post I was introduced to the Bowtie Diagram. I love how this visualises how we manage risks and I feel this compliments test strategy. Why? Well surely our test strategies should be accounting for risk and how we manage it. Whether you define testing in a specific focused sense (like functionally testing code) or in a holistic or broader sense (like viewing code reviews as testing, or monitoring or simply asking the question “What do end users want?”) - these activities are ways of either preventing or mitigating risks. I feel if you want to be effective in helping improve the quality of a project or product, you need to assess the potential risks and how you are going to prevent or mitigate them and therefore also assess where your time is best placed.
What are Bowtie Diagrams?
The short version - it’s a diagram that looks like a bowtie, where you describe a hazard, the top likely harmful event then list threats that will trigger the event and consequences from the event happening. For each threat and consequence, you describe a prevention or mitigation.



The long version, (and better more comprehensively described version) can be read here - https://www.cgerisk.com/knowledgebase/The_bowtie_method

What I love about these diagrams is that they more visually describe and explain risks and how we intend to manage them. Creating them is a useful exercise in exploring risks we may not have normally thought of, but in particular, I find we don’t explore the right hand side (consequences) of these diagrams very often. I find most of the time in software development that we are very reactionary to consequences, and even then, we don’t not typically spend much time on improving mitigation. 


Managing the threats
I’ve started using these diagrams to explain my recent approaches to test strategy because they neatly highlight why I’m not focusing all of my attention on automated test scripts or large manual regression test plans. I view automated test scripts as a barrier to the threat of code changes. Perhaps the majority of people out there view this as the biggest threat to quality, perhaps many people understand the word “bug” to mean something the computer has done wrong. These automated test scripts or regression test plans may well catch most of these. But are these the only threats?

I see other threats to quality and I feel we have a role to play in “testing” for these and helping prevent or mitigate them.

Managing the consequences
Any good tester knows you cannot think of or test for everything. There are holes in our test coverage, we constantly make decisions to focus our testing to make the most of precious time. We knowingly or unknowingly choose to not test some scenarios and therefore make micro judgements on risk all of the time. There are also limits to our knowledge and limits to the environments in which we test, sometimes we don’t have control over all of the possible variables. So what happens when an incident happens and we start facing the consequences? Have we thought about how we prevent, mitigate or recover from those consequences? Do we have visibility of these events occurring or signs they may be about to occur?
I find this particular area is rarely talked about in such terms. Perhaps there is some notion of monitoring and alerting. Usually there are some disaster recovery plans. But are testers actively involved? Are we actively improving and re-assessing these? I typically find most projects do not consider this as part of their strategy, in most cases it seems to be an after-thought. I think most of this stems from these areas typically being an area of responsibility for Ops Engineers, SysAdmins, DBAs and the like, whereas as testers we have typically focused on software and application teams. As the concept of DevOps becomes ever popular, we can now start to get more involved in the operational aspects of our products, which I think can relieve a lot of pressure for us to prevent problems even occurring.


Mapping the diagram to testing
An example of using a diagram like this within software development and testing:







I feel our preventative measures to an incident occurring are typically pretty good from a strategic view, especially lately where it has become more and more accepted that embedding testers within development teams improves their effectiveness. Yes, maybe we aren’t writing unit tests or involving testers in the right ways sometimes still. But overall, even with such issues, efforts are being made to improve how we deliver software to production.

But on the right-hand side, we generally suffer in organisations where DevOps has not been adopted. And when I say DevOps, I don’t mean devs the write infrastructure-as-code, I mean teams who are responsible and capable of both delivering software solutions and operating and maintaining them. Usually, we see the Ops side of things separated into its own silo still and very little awareness or involvement from a software development team of their activities. But Ops plays a very key role in the above diagram because they tend to be responsible for implementing and improving the barriers or mitigations that help reduce the impact of an incident.

I feel the diagram neatly brings this element into focus and helps contribute to the wider DevOps movement towards a holistic view of software development, towards including aspects such as maintenance, operability, network security, architecture performance and resilience as qualities of the product too.

As testers I feel we can help advocate for this by:

  • Asking questions such as “if this goes wrong in production, how will we know?”
  • Requesting access to production monitoring and regularly checking for and reporting bugs in production.
  • Encouraging teams to use a TV monitor to show the current usage of production and graphs of performance and errors.
  • If you have programming/technical skills, helping the team add new monitoring checks, not dissimilar to automation checks. (e.g. Sensu checks)
  • Becoming involved with and performing OAT (Operational Acceptance Testing) where you test what happens to the product in both expected downtime (such as deploying new versions) and disaster scenarios, including testing the guides and checklists for recovery.
  • Advocating for Chaos Engineering.

Friday, 17 May 2019

Using character recognition for browser automation

Introduction

I was venting recently to my colleague Chris Johnson about the frustration of working with yet another horrendous website that didn’t have any usable locators and he put to me the idea of “what if we had visual driven automation, where the automation treats the website as a complete black box, like a human user?”. It got me thinking, surely this could be possible already in some respect. As I am already very familiar with AWS (Amazon Web Services), I had a look if they had any services that could be useful for this and found their brand new Textract service, which lets you send image documents (.jpg, .png or .pdf) and analyses them for text. This then gave me the idea to try and use this service with Selenium to find elements on a page and click them without any need to interact or understand the HTML. This post is a report on my findings.

What is Textract and how does it work?

Textract is a type of OCR (Optical Character Recognition) service that detects text and data in image documents. It works by supplying it an image file and it responds with the results of its analysis, a list of words, sentences and objects (like forms and tables) that it has identified. Each word or sentence has a set of data including the location in terms a rectangular box and a percentage confidence of the accuracy of its findings. Behind the scenes it manages this via a pre-trained machine learning algorithm.





 

As with other AWS services, you can access Textract via the AWS CLI so for my research I used the boto3 library for Python3 which allows you to interact with the AWS CLI within Python very easily.

When you are using Textract, you receive JSON responses that look like this:

{
        "Blocks": [
            {
                "Geometry": {
                    "BoundingBox": {"Width": 1.0,"Top": 0.0,"Left": 0.0,"Height": 1.0},
                    "Polygon": [
                        {"Y": 0.0, "X": 0.0},
                        {"Y": 0.0, "X": 1.0},
                        {"Y": 1.0, "X": 1.0},
                        {"Y": 1.0,"X": 0.0}
                    ]
                },
            "Text": "Store",
            "Confidence": "99.1231233333",
            "BlockType": "WORD"
        }
        ]

}

My test

So the idea for my test was to create a simple script that would:
  1.     Use Selenium to load a website.
  2.     Take a screenshot.
  3.     Send the screenshot to Textract.
  4.     Find the location of some text on the website that would take us to a new page.
  5.     Click the location on the website.
  6.     Check that we had gone to the correct new page.

Pre-requisites

In order to get this test working, this is what I needed:
  • A computer with Python3 installed and the relevant libraries (boto3, Selenium).
  • Chromedriver and Chrome installed.
  • An AWS account with access to Textract. At the time of writing Textract is in preview and you need to ask Amazon nicely for access. It took about a week for me to get access.
  • An AWS IAM (Identity and Access Management) user on my AWS account that has the permissions to interact with Textract.
  • AWS CLI installed on my machine and configured with the IAM user and an AWS region that Textract is available in (its currently only available in a few regions).
I chose to use Python because I’m very familiar with it, and it allows me to write a simple script like this very rapidly with little setup needed. You can use any programming language, you just need to be able to interact with the AWS CLI.

My findings

Did it work? Yes, effectively. I was able to create a script that did all of the above and successfully navigate between pages...but with a very big caveat - only if the text appeared in the top left corner of the page in any resolution. Why? When you request a screenshot from web browsers, they do not take a screenshot of what you can see, but instead they take a full page screenshot. This means you cannot easily map the pixel locations from the screenshots to the browser window, particularly when the real browser window is a smaller resolution and the website dynamically changes visible location of elements. Even at larger resolutions there is a small difference in resolution sizes because you almost never display the full page in a window.

This means that my idea has limited usefulness until I could find a better way of generating screenshots that are just the visible area rendered by the browser.

However, in principle my idea did work if you can provide an image that maps 1:1 with the browser window, but there are still other issues. I tried testing against different websites just to see what other issues would crop up:

  • Naturally you need relatively clear text in order for Textract to confidently find it. However, this could be argued that unclear text would be an accessibility issue anyway and in the below screenshot you can see Textract does a reasonable job of even reading unusual fonts or text that isn’t perfectly straight.
  • Obviously if you want to find elements that are not text, Textract isn’t useful at all. Potentially other pattern recognition services could be useful though if you could provide it an image pattern to find rather than just a text string.
  • It could be tricky to figure out which is the right element if there is more than one example of some text on the page.
  • While Textract is quite fast (roughly 2 seconds or so for it to return a response), some dynamic elements of the page could change in that time. Some website have scrolling elements or pop-ups that only appear after a short time. This means you might not be able to totally run a black box test as you need to check for these elements before proceeding.
Due to these issues, I don’t think you could totally remove the need for HTML locators yet. But there is certainly some potential here to expand the abilities of automation tests and maybe make parts of them more powerful. Even if you don’t use this tech to replace locators for Selenium, there are other applications that might not have been easy before - such as testing PDF files generated by the systems you test.

Other considerations

If you were to consider using this tech in your automation, there are some other factors to consider:
  • Costs, AWS charge for the use of Textract. Depending on your context and how you use it, this may be a limiting factor. The pricing can be found here: https://aws.amazon.com/textract/pricing/
  • Data privacy, AWS state they may store and use the documents you upload to maintain and improve the machine learning algorithm behind Textract. You may work in a workplace where the screenshots contain sensitive information that AWS employees are not authorised to look at. You can request AWS delete documents you upload, but this is a manual process of contacting their support team.
  • While I did find Textract responded quite quickly (around 2 seconds), I don’t know its capabilities under significant load nor did I hit the API request limits. I was only running very low numbers of requests.
  • Textract is currently only available in certain AWS regions, so if you require data be held in particular geographic locations then this is may be a limiting factor.
Of course, other services similar to AWS Textract may be available which overcome these factors.

Summary

This was a fun little exploration into technology I’ve not used before and it was nice to find it so easy to use. I think there is some potential in using this service for automation tooling, particularly if you need to analyse image documents.

If you’re looking to totally remove the need for HTML locators, I think there is more work required though, if you know of a good solution to the browser screenshot resolution problem let me know!

My code

Here’s my code if you’d like to try it yourself:

https://github.com/Ardius/python-selenium-textract/blob/master/textract_test.py