State of AI applied to Quality Engineering 2021-22
Section 4.2: Automate & Scale

Chapter 4 by Capgemini

AI applied to Usability Testing

Business ●●○○○
Technical ●●●○○

Listen to the audio version

Download the "Section 4.2: Automate & Scale" as a PDF

Use the site navigation to visit other sections and download further PDF content

 

By submitting this form, I understand that my data will be processed by Sogeti as described in the Privacy Policy.

Human effort continues to have a sizable impact on the User Experience and Usability fields. There are only a few tools available that can automate time-consuming human effort. While recent advances in AI have made it possible for machines to observe and interact with humans, we believe we are a long way from AI conducting end-to-end usability testing.

Has a door ever caused you to pause and consider whether you should pull or push to open it? You are not alone – most people have a story about it. Indeed, the phenomenon is so widespread that it has a name: "Norman doors," after design guru Don Norman, author of The Design of Everyday Things, which examined the phenomenon. Any door that is perplexing or difficult to use is referred to as a Norman door. In other words, insufficient usability.

Usability is about how easily people can interact with a product or service, and the title of one of the most popular books on usability, Steve Krug's "Don't make me think," encapsulates the essence of usability. Make it simple for people to interact with a product so they don't have to pause and consider their next move.

So, how do we engineer usable products? User-centered design is a technique for creating usable products and services. It is a process that involves actual users throughout the design process. Observing and conversing with real users to ascertain and comprehend their problems. Utilize the insights gained during Discovery to define the problem to be solved, the users for whom we are designing, and to validate the design with them. As we develop solutions, we test them with users at an early stage to ensure they work for them. And once the product is released, continue to listen and monitor for feedback.

As Nielsen Norman Group, a leader in the field of user experience, demonstrates, UX practitioners employ a variety of methods during the various stages of a user-centered design process.

Figure: Natural language generation (NNG) methods throughout the User experience design lifecycle

 

 

The image above illustrates the use of natural language generation (NNG) methods throughout the User experience design lifecycle, but for the purposes of this report, we will concentrate on the Test and Listen stages, which are the traditional stages for testing and monitoring usability.

TEST

  • Qualitative usability testing (in-person or remote)
  • Benchmark testing
  • Accessibility evaluation

LISTEN

  • Survey
  • Analytics review
  • Search-log analysis
  • Usability-bug review

What Is Usability Testing?

Usability testing is fundamentally distinct from the other types of testing that typically occur during software development. Participants attempt to complete typical tasks while trained observers observe, listen, and take notes during a usability test. The objective is to identify any usability issues or design flaws, ascertain participant satisfaction, and use that feedback to refine the design.

Usability testing is distinct from User Acceptance Testing (UAT), which is concerned with ensuring that the solution meets the requirements. On the other hand, usability testing is used to ensure that the solution is simple and intuitive for actual end users. It can be carried out throughout UX Design, and we can conduct tests on prototypes or the final digital product.

Types of Usability Testing Methods

One way of classifying testing methods is qualitative or quantitative

  • Quantitative Analysis- involves looking at Data, actual numbers. This includes analyzing web traffic data and determining facts through the use of analytics tools such as Google Analytics. Smaller datasets can be extrapolated to analyze and predict the behavior of larger groups if the rules are set appropriately. For instance, if 70% of 100 potential customers abruptly leave your website from the same page, it is reasonable to assume that the page has a problem, which would also occur with a larger number of visitors.

  • Qualitative Analysis – is used to assess where numbers are impossible. It is a subjective matter. Clothing softness, or an opinion about a new flavor, are not quantifiable; this is where opinions matter. While customer feedback or Voice of the Customer data can assist in determining opinion, this cannot be extrapolated to mean that the majority of customers will feel the same way. Some ways of gathering inputs include –
    • Focus group discussions
    • In-depth interviews
    • Seeking views on an online community
    • Trend monitoring
    • Survey questionnaires

Another way to consider usability testing options is to consider the manner in which they are conducted, which may be moderated or unmoderated, remote or in-person, or a combination of the two.

  • Moderated usability testing, as the name suggests, involves the active participation of a trained moderator or facilitator. The moderator leads the usability testing session, which includes administering tasks, observing and documenting participant behavior and comments, conducting contextual questions and probing as necessary, and clarifying points for participants in real time.

  • Unmoderated testing gives control to the participant to complete the task. While participants are encouraged to 'Think aloud' during sessions in order to capture their thought processes, there is no moderator present to guide or question them. There is no assistance available to the participant, and the moderator has no opportunity to ask detailed questions about specific actions. Given this, it is recommended that you use this method to collect input for very specific elements. For instance, to evaluate a new widget optimized for mega dropdowns.

  • In-person tests are the most recommended because they place the participant and moderator in the same room and allow moderators to gauge body language and have lengthy discussions about workflow validation.

  • Remote tests are useful when the participant and moderator are nor in the same place. In this case, the participant is in his or her own secure environment, and the moderator monitors the participant's interaction with the system/applications via screen sharing and video captures.

In practice, a combination of the aforementioned tests is used to gain the most insight.

Content-Break-11.jpg

Moderated & In-Person

  • Usability Lab (Face-to-face) – A controlled environment, generally a one-way mirror lab is used to conduct these tests. Participants are chosen based on their market segmentation and then invited to spend time completing tasks in a controlled environment.
  • Guerilla Testing (face-to-face) – Random testing with an uncontrolled sample of participants in a café, mall, or other public space to elicit feedback in natural settings has a number of advantages. There is no time constraint, no predefined user group, and no threat of observation. In these instances, feedback is frequently open, as no expectation has been established.

Moderated & Remote

  • Online tools – There is multiple tools available for conducting moderated and remote tests. These platforms allow for recording of screen interactions and capturing expressions of participants via camera and also sound. Some commonly used platforms include Zoom, Gotomeeting, UXcam, Usertesting, lookback.io, userlytics and others.
  • Card Sorting – Another great way of understanding user expectations is via Card sorting tools. Once the user groups are known and your site content outlined, card sorting tools help in gathering how users perceive the information architecture. Some tools that assist in tracking users and conducting remote card sorting tests include – OptimalSort, UserZoom, Justinmind, UXTweak and others.

Unmoderated & In-Person

  • Observation – A controlled environment with a one-way mirror is often the most trusted way to observe how participants interact with the human-computer Interface. Setting a camera to capture facial expressions or direct observation brings in great insights.
  • Eye-tracking –Involves measuring either where the eye is focused or the motion of the eye as an individual views a web page. The movement of the participants' eye pupils is tracked using appropriate eye-tracking software, which assists in answering questions such as where they are looking, for how long, the order in which their eyes move, which parts of the software are missed, navigation throughout the page, and provides insight into item placement on the page. Heat Maps are generated to shed light on this.

Unmoderated & Remote

  • Session recordings - This technique is usually used to test specific parts of a product (specific interaction scenarios), rather than providing an overall review of the user journey. Participants record their sessions without the help of the moderator using various tools. Initial guidance is provided to track, post that the participants are left in isolation to complete the tasks.
  • Online testing platforms.

While unmoderated remote testing allows us to cast a wider net with many more participants, it does not offer opportunities for contextual probing. Moderated in-person testing provides rich user insights that are often missed using other methods. It also allows us to focus on certain features and functionality with a few participants, as well as probe any additional concepts.

Common concerns in usability testing

The most common concerns that we have seen raised for usability testing include:

  • Time
    Usability testing is time-consuming. It takes time to plan, prepare, recruit, conduct, analyze and report out. Qualitative testing using moderated and in-person testing in a usability lab is the most time consuming, but guerilla methods speed things up. Unmoderated and remote methods can speed up recruiting and conducting the test, as well as part of the analysis.

  • Expense
    Budget availability often determines the mode and type of the usability tests conducted. To conduct moderated and in-person tests, a usability lab infrastructure, preferably with a one-way mirror, is required. Additionally to this expense, participant fees apply. This includes costs associated with screening, recruitment, and travel to controlled environment locations, as well as a significant cost associated with the incentives provided to entice participants to participate in the study. The costs of remote tests are significantly lower, but this is a trade-off, and there is a risk associated with not monitoring body language, observing users, and listening to conversations.

  • Effort
    Each type of usability test has a fixed cost. This includes costs associated with planning, such as participant screening, recruitment, and incentivizing. The cost of analysis and reporting varies according to the type of usability test being conducted and the expected outcomes. This also includes the costs of the tool/platform, reporting, and the manual effort required to monitor the entire process.

    Qualitative usability testing provides enormous value and insights – most notably with moderated in-person testing – but it also requires a great deal of time and effort.

Content-Break-09.jpg

How does AI fit into usability testing and monitoring?

While we have all heard doomsday predictions that machines and artificial intelligence will eventually supplant humans, this is not the case, even more so in the field of user experience design and usability testing. Design and usability testing have always relied heavily on the involvement and participation of humans – both as designers and as consumers. However, AI is beginning to play a supporting role in usability testing, and we will discuss several instances in which we have seen and used AI in this capacity. However, it will be several years before AI wins a best supporting actor award.
Artificial intelligence can assist in a few ways with the methods commonly used in the Test and Listen stages, which are the traditional stages for testing and monitoring usability. AI tools can support the efforts of designers and researchers by:

  • reducing manual human effort (e.g., transcribing),
  • making people more efficient and saving time (e.g., text-based video editing)
  • providing machine learning based insight (e.g., attention prediction)
  • augmenting human effort (e.g., AI evaluation)

I recall conducting my first usability test more than two decades ago. I traveled across the country with a rollaboard carry-on bag that served as my usability lab-in-a-box, complete with a MiniDV video camera (if you're unfamiliar with the term, let me just say it was the most compact video camera available at the time), tripod, tapes, batteries, and power cords.
While I took notes during tests, I frequently needed to review recordings in order to clarify something or obtain a quote; manual transcription was a pain and a time sink. While a highlight reel was an extremely effective way to summarize and demonstrate real user usage to non-involved executives and gain their buy-in, I was grateful to have team members who excelled at it. Even with their experience and speed, creating a short highlight reel from hours of video recordings took an eternity.
We've come a long way over the years; we now have new tools that enable us to be more efficient, streamline certain processes, and make it easier to report out, tag team members, and share findings with others. Here are my top areas I find AI invaluable:

Transcribing

UX researchers who moderate or lead usability testing must actively listen to not only what users say, but also to what goes unsaid. This leaves little time for taking detailed notes, which is why it is common to review recordings later. I've lost count of how many hours I could have saved if we had AI transcription services available at the time.
In a matter of minutes, these AI-powered services can virtually type up transcripts of recordings, complete with timestamps. While I have yet to find one that is perfect, I appreciate that even with 70-80% accuracy, they do the majority of the work, and many tools highlight potentially incorrect words.
There are numerous transcription services, and almost all of them offer a free trial period. I currently use HappyScribe to upload recordings, or Otter.ai for live notes and captioning of meetings.

Video editing


Video recordings from a single round of usability testing can easily exceed an hour in length, and no one wants to binge watch them. What has the greatest impact are shorter clips highlighting various aspects of a task, such as a user struggling with something, or, even better, a montage or highlight reel of multiple users struggling with the same feature or functionality. Previously, this required converting and importing the video to a powerful Mac, and then searching for clips using the timestamps provided. There are tools available these days that enable you to create short video clips without having to review each frame of the video. You upload a video and these tools use artificial intelligence to transcribe the speech in the video in a matter of minutes. You can then scan the text for the sections you want to keep (or delete) and convert them to bite-size videos instantly. Video editing powered by artificial intelligence is the way to go! Additionally, you can convert the transcripts to subtitles if necessary.
Again, there are numerous tools available, but Piktostory (https://piktochart.com/piktostory) is one that I frequently use. Another popular tool is Pictory.

Eye-tracking


Eye tracking studies are used to determine where users' attention is focused within a design and how they navigate through screens and interfaces. Eye tracking studies require specialized equipment and are time consuming, but they reveal eye movement patterns and heatmaps of where people concentrated their attention.
A simple example is the checkout flow, where designers want users to focus on critical information and complete the checkout process without being distracted.
Attention Insight is an artificial intelligence-powered platform that simulates human vision in order to predict how a design will be perceived by users. According to the company, their heatmaps are 90% accurate when compared to real-time eye-tracking heatmaps.
Here are two heatmaps – the first is from lingscars.com and clearly demonstrates how a "loud" background can distract people (see how many locations outside of the content draw the (AI) eye below:

Figure: Eye tracking

 

The second image is of an Amazon checkout review screen that catches the customer's attention in all the right places as they scan to verify their shipping address, payment information, product, shipping option, and, of course, the Prime Trial offer!

Figure: Checkout review screen

 

Without any of Amazon's Prime-like offers, BestBuy.com's final review scored significantly higher than Amazon in the areas where people are likely to look.

With some other screens, I realized that the tool did not always accurately reflect where people would look, and thus I would not rely on it entirely. However, I believe that early design validation is necessary to ensure that the areas we want people to focus on receive heat in the heatmap and that they are not distracted by elements they are not supposed to pay attention to.
There are other companies like AttentionInsight, including Visual Eyes and Expoze.

Usability Testing


UserTesting, UserZoom, and Loop11 are all platforms that enable UX researchers to conduct moderated and unmoderated remote usability tests. While conducting the tests requires the full involvement of UX researchers or designers, data collection and analysis are automated, and reporting is simplified. AI is used in several of the above-mentioned features, including transcription, video editing, and highlight reel generation, as well as the generation of heatmaps to visualize user attention or clicks.
WEVO is one service that combines human input and AI technology to deliver rapid, accurate insights at scale. It examines how users rate experiences on several dimensions, including Experience, Clarity, Appeal, Relevance, and Credibility, and compares them to industry standards. WEVO then uses algorithms to forecast the success of individual components of a design.

Figure: How WEVO works – courtesy WEVO Conversion

Figure: How WEVO works – courtesy WEVO Conversion

 

While the image above demonstrates how businesses use WEVO, this may appear to be outsourcing usability testing, what drew me in was the combination of AI and human analysis and insights, as demonstrated in the sample report screenshots below.

Figure: Driver Scores and Quotes – WEVO report

Figure: Driver Scores and Quotes – WEVO report

 

Figure: Audience Reactions (Likes) – Heatmap and Quotes – WEVO report

Figure: Audience Reactions (Likes) – Heatmap and Quotes – WEVO report

 

Figure: Audience Expectations (before and after) and related quotes – WEVO report

Figure: Audience Expectations (before and after) and related quotes – WEVO report

 

WEVO does not record video or audio from participants, but does collect text feedback as part of the usability test, which is then used to gauge sentiment and obtain direct quotes.
Finally, researchers at Google’s AI research division recently conducted experiments to measure “perceived tappability” of mobile screens. Their AI-powered mobile app usability testing results closely matched human results.

Additionally, predictive models for generating user behaviors are an area worth exploring and utilizing.

Future possibilities

As previously stated, while AI can automate and assist with analysis and reporting, it cannot completely replace human involvement in usability testing. We anticipate that AI's influence will grow in the future.

If AI has written articles, How much longer until AI can generate usability test scripts based on a few inputs from UX Researchers?
The Guardian article linked above was written by GPT-3, OpenAI's language generator, which attempted to complete a prompt. GPT-3 generated eight distinct outputs, or essays, each with its own distinct, interesting, and distinctive argument. The Guardian selected the best passages from each and edited them like a human op-ed, removing unnecessary lines and paragraphs and rearranging them in some places. They claimed that editing took less time than editing numerous human op-eds.
We should see AI play a role in creating tasks and scripts for usability tests in the not-too-distant future. Instead of starting with a blank slate, skilled usability professionals will be required to provide the initial prompts and then act as editor to create the final script.

AI is already capable of recognizing certain emotions in photographs and videos of people. As AI advances, will it be able to automatically identify specific emotions (frustration or joy) in video recordings of usability tests and then automatically create highlight reels that demonstrate when participants were frustrated during a test or how different participants reacted to a feature or task?
And, in the distant future, what prevents AI from serving as a semi-moderated alternative to human-moderated and automated unmoderated usability testing? It may be an improvement over the current state of automated unmoderated testing. Would an AI moderator who sounds like Morgan Freeman annoy or entice participants?

About the author

Monica Deshpande

Monica Deshpande

For over two decades, Monica has been simplifying complexity for its users. Monica is presently the head of the User Experience team in India's Digital Customer Experience. She is focused on business growth, solution development, and delivery, while also mentoring the next generation of designers. She has engaged in a variety of design discussion groups and given presentations at a variety of user experience events. She continues to study and experiment in new areas in order to decipher the modern age's intricacies.

Lyndon Cerejo

Lyndon Cerejo

Lyndon Cerejo is a User Experience Design leader and author. He has over twenty years of hands-on experience helping companies design usable and engaging experiences for their customers, employees, and partners. He has provided digital design and design leadership for companies ranging from start-ups to Fortune 100 businesses across industries.
When not doing design, Lyndon enjoys speaking, teaching, writing, and has co-authored books on marketing and innovation. His pen name will reveal a children’s book series titled Mysteries In History. He enjoys LEGO and photography and incorporates both in his writing and speaking. Every single day for over a year, as an exercise in curiosity, creativity, and storytelling, he took a photo featuring his LEGO lookalike and its perspective, which included celebrities, tourist attractions around the world, and a few strange looks from onlookers.
Over his past two decades in human-centered design, Lyndon has observed designerly (or designer-like) habits and behaviors that make designers successful, and their results truly transformational. He writes about designerly habits that everyone, designers and non-designers alike, can adopt to transform their professional and personal lives. He can be found at cerejo.com and he offers designerly resources including a curated newsletter about designerly habits and behaviors at BeingDesignerly.com

About Capgemini

Capgemini is a global leader in partnering with companies to transform and manage their business by harnessing the power of technology. The Group is guided everyday by its purpose of unleashing human energy through technology for an inclusive and sustainable future. It is a responsible and diverse organisation of 270,000 team members in nearly 50 countries. With its strong 50 year heritage and deep industry expertise, Capgemini is trusted by its clients to address the entire breadth of their business needs, from strategy and design to operations, fueled by the fast evolving and innovative world of cloud, data, AI, connectivity, software, digital engineering and platforms. The Group reported in 2020 global revenues of €16 billion.

Get the Future You Want  I  www.capgemini.com

 

 

 

Capgemini logo