Heuristic evaluation and usability testing are two of the most established methods to evaluate software for usability. When in doubt, one is the winner.
Assessing the ease of use of a software is critical, this is because the results inform us about aspects of the product that work well and those that don’t. The former we want to maintain and strengthen, the latter we need to revise and improve. Heuristic evaluation and usability testing are two of the most established methods to evaluate software for usability.
Usability Testing: Different Shapes and Flavors
The premise of this method is that the usability of a user interface can be judged by observing real target users interacting with it. The test participants are being asked to carry out tasks they are familiar with, using a product that they may or may not know. Product areas that cause problems or confusion are then flagged for mitigation through re-design. Areas that work well and that test users positively comment on, are being registered as well.
To enrich the level of insights you gain from usability test sessions, test participants are instructed to “think aloud,” to share their thoughts and sentiments about how they experience their interaction with the product. Usability testing is primarily a qualitative method, but numeric performance measures are typically assessed as well. In addition, questionnaires can be utilized in order to learn about demographic information and the perceived usability of the product being tested.
Usability tests come in different shapes and flavors. They may be carried out in a central location like a usability lab, or de-centrally at the places where the test participants are located. The third and most used option nowadays is to test remotely through web meetings. In that case the screen and the web cam video of the participants are being streamed and recorded.
Related Article: Usability Testing or User Acceptance Testing?
Moderated vs. Unmoderated Usability Testing
Another variation is moderated vs. unmoderated usability testing. The best insights can be gained from live testing, where a moderator interacts with test participants, asks them about their opinions on specific situations and can react to things that happen during the test; for example, when test users have clarifying questions.
Unmoderated testing — while providing a high level of flexibility (since test participants can carry out the test at their leisure without the researcher being present) — does not allow interaction between participant and researcher. This can be a risk in cases where there are technical issues with the product to be tested, or with the platform used to run and record the test. Or when test users have trouble understanding the test tasks. In all these instances the lack of a moderator may result in no data, incomplete data or skewed data.
What Fidelity Level of a Product Can You Test?
The spectrum ranges from paper-based prototypes all the way to full-fledged products. Of course, testing a paper prototype will not yield insights regarding animations or load times. On the other hand, testing a ready to launch or already launched product is late in the process, considering that the cost of change grows exponentially during the development stage.
How many users should you test with? The answer is not “the more, the better.” The chart below, which is based on empirical research, shows that testing four users already yields more than 75% of the usability problems that are discoverable. It is a curve of diminishing return: the more users you test, the more often you discover the same issues but do not learn of many new ones.
Note that if you have distinct user groups for your product, you should test several representatives of each group.
This is a method where a user interface is evaluated for its adherence to certain sets of design principles. These are not highly precise and detailed, but rather rules of thumb: so called heuristics. The idea is to identify what areas of an interactive product comply with the heuristics, and more importantly which areas do not. Those violations can then be addressed and mitigated through re-design. You can think of these guidelines as cue cards that you use when evaluating a user interface — they remind you to evaluate certain qualities that constitute usability and to some degree UX (user experience).
There are several sets of heuristics:
Nielsen’s and Shneiderman’s heuristics have not changed since they were incepted in the 1990s. ISO (International Standards Organization) updated its list of heuristics in 2020.
The three sets of heuristics are very similar, therefore it does not fully matter which set you are using. The best known and most widely used set is Nielsen’s 10 Heuristics. For the Chrome web browser there is an extension that supports running a heuristic evaluation with his list of principles.
Heuristic evaluations can be carried out by one or several people. While it is obvious that two people find more usability issues than one, the relationship between number of evaluators and number of usability issues found is actually a curve of diminishing return again, very similar to the curve from above for usability testing.
According to the chart above, five evaluators find about 75% of usability problems.
The process that the evaluators follow consists of the following steps:
- Depending on the nature of the user interface a briefing session may be needed where the product is being introduced and explained to the evaluators.
- Each evaluator separately checks the user interface against the heuristics. Oftentimes the evaluators first take one pass to get a feel for the product and then during a second pass focus on specific product screens and features.
- The evaluators then share their individual findings and collaborate to aggregate their findings.
- In the last step they prioritize each finding based on severity. This is helpful because realistically due to time or resource constraints not every usability problem can be taken care of. In those situations only the most severe issues can be mitigated.
Related Article: Understanding the Impact Text Treatment Has on Usability, User Experience
Comparison: Which Testing Method Is Better?
What both methods have in common is that they are qualitative in nature and best to be used in a formative fashion: carrying out evaluations iteratively to further improve the usability of a product. Both can be used summatively to assess the ease of use once at the end of the development process, but without further improving the product, the value and impact are limited.
The question of which method is better boils down to three factors: effort, effectiveness, and validity.
Effort: How much work is it to run each method? Usability testing requires significantly more time and money for planning and executing the test, and to properly analyze the data. Heuristic evaluation is normally done with a company’s own staff; therefore no external people have to be found, recruited, scheduled and incentivized.
When it comes to the analysis of data, usability testing is more involved because more data is being acquired — typically in the form of pre- and post-test surveys in addition to the test task execution.
Effectiveness: How strong is each method in discovering usability issues? Both usability testing and heuristic evaluation have been shown to spot usability problem areas within products. However, usability testing is better at uncovering deeper-rooted and oftentimes hidden issues, while heuristics oftentimes find trivial and easy to spot problems.
Validity: How true are the methods? Do their results really tell us about the ease of use of a product? Usability testing has a high validity because the insights are based on actual observations of real target users interacting with and commenting on the product under evaluation. Heuristic evaluation relies on evaluators cross-checking the product against design rules. In most cases, these evaluators are NOT the end users. Just because a user interface may comply with the heuristics does not guarantee that it actually can be used effectively, efficiently and with joy by the specific target audience.
Final Thoughts: Usability Testing Is Better Choice
Based on the above, when in doubt, usability testing is the better evaluation choice. That said, it is a perfectly valid option to do both — one after the other. In that case you would first run a heuristic evaluation which uncovers the easy to spot usability problems. Those can then be fixed. Afterwards, with the obvious issues mitigated, a usability test identifies harder to find usability problems.
Usability testing and heuristic evaluation assess the usability of a product. Do they also gauge UX? Is there a difference between usability and UX?
Usability being the older concept of the two has always been about the ease of use. That ease is defined by the effectiveness, efficiency and satisfactory usage of a user interface. UX has a broader scope and includes the holistic experience of a user with a product, including feelings, attitudes, perceptions and more. So, usability is part of the UX. Since usability testing involves listening to and inquiring about users’ feelings and sentiments, it lends itself well for assessing UX.
The heuristics have traditionally been much more focused on usability, because the rules of thumb did not cover the holistic experience of users. In recent years however, at least one of the three sets of heuristics introduced above have been updated to reflect more of the users’ subjective experience.
With the last update of ISO 9241 in 2020 a new heuristic was introduced, called “User Engagement.” It says that a well-designed product needs to present content and features in an inviting and motivating way that supports continued interaction between user and product.