Evaluation Techniques for Interactive Systems

Gayan Malinda
9 min readApr 17, 2022

--

What is the evaluation?

Evaluation role is to access designs and test systems to ensure that they actually behave as we expect and meet user requirements.

This has the advantage of allowing problems to be sorted out before a lot of time and money is spent on implementation and it’s far easier to change a design in the early phases of development than it is later on.

Goals of evaluation

Evaluation has three main goals.

  1. Assess the extent of system functionality

The system’s functionality is important in that it must accord with the user’s requirements. Evaluation at this level may measure the user’s performance with the system to assess the effectiveness of the system in supporting the task.

e.g — if a filing clerk is used to retrieving a customer’s file by the postal address, the same capability (at least) should be provided in the computerized file system.

2. Assess the effect of the interface on the user

It is important to assess the user’s experience of the interaction and its impact upon the user. And includes considering aspects such as how easy the system is to learn, its usability and the user’s satisfaction with it. It may also include his enjoyment and emotional response, particularly in the case of systems that are aimed at leisure or entertainment.

3. Identify specific problems

This may be aspects of the design which, when used in their intended context, cause unexpected results, or confusion amongst users. And it is related to both the functionality and usability of the design.

Evaluation through expert analysis

The evaluation of a system should ideally take place before any implementation work begins. If the design itself can be evaluated, costly mistakes can be avoided since the design may be changed before any large resource commitments are made. A variety of strategies for evaluating interactive systems using expert analysis have been developed. These methods are flexible assessment approaches since they may be utilized at any point of the development process, from design specifications through storyboards and prototypes to full implementations.

There are a few expert-based assessment approaches.

  • Cognitive Walkthrough
  • Heuristic Evaluation
  • Model-based evaluation

A) Cognitive Walkthrough

This method was proposed by Polson et al.

This is one of most efficient and extremely cost-effective way of increasing the usability of the system. Most users prefer to do things to learn a product rather than to read a manual or follow a set of instructions. So with this evaluation it is assured that the design is easy to pick up by a novice and takes less time to become and expert in using the design.

How to conduct a Cognitive Walkthrough?

An expert ‘walk through’ each and every possible paths of the design to understand what are the possible problems that a user can face. This expert must think in the perspective of a potential user to increase the evaluation result correctly. Therefore the person who conduct is an expert in cognitive psychology.

For each task walkthrough the expert should considers about ,

🔸 What impact will interaction have on user?

🔸 What cognitive processes are required?

🔸 What learning problems may occur?

The expert must have a specification or prototype of the system, description of tasks and also written list of the actions needed to complete the task with the proposed system to do this walkthrough.

B) Heuristic Evaluation

This evaluation method was proposed by Nielsen and Molich. There are well defined 10 usability heuristics. The design examined by experts to see if these are violated (3 to 5 enough).

C) Model-based Evaluation

Model-based evaluation is using a model of how a human would use a proposed system to obtain predicted usability measures by calculation or simulation. These predictions can replace or supplement empirical measurements obtained by user testing. Model-based evaluation is combining
cognitive and design models for the evaluation process.

Models used for model-based Evaluations,

  • GOMS model
  • Keystroke-level model
  • Design rationale
  • Dialog models

Evaluation through user participation

User participation in evaluation tends to occur in the later stages of development when there is at least a working prototype of the system in place.

Styles of evaluation

Techniques that are available for evaluation with users, we will distinguish between two distinct evaluation styles: those performed under laboratory conditions and those conducted in the work environment or ‘in the field’.

Laboratory studies-

Users are taken out of their normal work environment to take part in controlled tests, often in a specialist usability laboratory.

Advantages -

  • Specialist equipment available- Contain sophisticated audio/visual recording and analysis facilities, two-way mirrors, instrumented computers and the like, which cannot be replicated in the work environment.
  • Uninterrupted environment- The participant operates in an interruption-free environment.

Disadvantages -

  • Lack of context — The unnatural situation may mean that one accurately records a situation that never arises in the real world
  • Difficult to observe several users cooperating

Appropriate — if system location is dangerous or impractical for constrained single user systems to allow controlled manipulation of use.

Field studies -

This type of evaluation takes the designer or evaluator out into the user’s work environment in order to observe the system in action.

Advantages -

  • Natural environment — Observe interactions between systems and between individuals that would have been missed in a laboratory study.
  • Context retained (though observation may alter it)- Seeing the user in his ‘natural environment’.
  • Longitudinal studies are possible.

Disadvantages -

  • Distractions — High levels of ambient noise, greater levels of movement and constant interruptions, such as phone calls, all make field observation difficult.
  • Noise

Appropriate — where context is crucial for longitudinal studies.

Empirical methods: experimental evaluation

This provides empirical evidence to support a particular claim or hypothesis. The evaluator chooses a hypothesis to test. Any changes in the behavioural measures are attributed to the different conditions.

There is a number of factors that are important to the overall reliability of the experiment.

  1. Participants

Represent the set of people who going used for the experiment. Since the choice of a participant is vital to the successful participants should be chosen to match the expected user population as closely as possible.

2. Variables

Represent things to modify and measure in the evaluation. There are two types of variables, independent and dependent variables.

  • Independent variables — Characteristic changed to produce different conditions. e.g. interface style, a number of menu items
  • Dependent variables — Characteristics measured in the experiment i. e.g. time is taken, a number of errors.

3. Hypothesis

A hypothesis is a prediction of the outcome of an experiment. It is framed in terms of variables. The aim of the experiment is to show that this prediction is correct, this is done by disproving the null hypothesis.

4. Experimental design

This design represents the process of doing the evaluation. There are two main methods and they are between-subjects and within-subjects.

  • Between-subjects (or randomized) design — each participant is assigned to a different condition, more users required and variation can bias results.
  • Within-subjects (or repeated measures) — each user performs under each different condition and less costly and less likely to suffer from user variation

Once you gather the data you need to analyze the data. You need to identify the type of data, discrete or continuous and then according to that you need analyze the data using statistical methods.

Observational techniques

  1. Think Aloud

In this method, a user asked to describe what he is doing and why what he thinks is happening. Eg. describing what he believes is happening, why he takes an action, what he is trying to do. This method requires little expertise ( simplicity ) and provides useful insight and shows the actual use of the system. But this method is can’t be used in every scenario.

Advantages —

  • Simplicity — requires little expertise
  • Can provide useful insight with an interface
  • Can show how system is actually use

Disadvantages —

  • Subjective
  • Selective — depending on the tasks provided
  • Act of describing may alter task performance — The process of observation can alter the way that people perform tasks and so provide a biased view

2. Cooperative Evaluation

In this method, user and evaluator collaborate and ask each other questions throughout. This is constrained and easier to use. Also, user encourages to even criticize the system.

Advantages —

  • Less constrained and easier to use
  • User is encouraged to criticize the system
  • Clarification possible

3. Protocol Analysis

Methods for recording user actions in protocol analysis,

  • paper and pencil — cheap, limited to writing speed
  • audio — good for a think-aloud, difficult to match with other protocols
  • video — accurate and realistic, needs special equipment, obtrusive
  • computer logging — automatic and unobtrusive, large amounts of data difficult to analyze
  • user notebooks- coarse and subjective, useful insights, good for longitudinal studies
  • Mixed-use in practice
  • audio/video transcription difficult and requires skill.
    Some automatic support tools available

4. Automated Analysis

Analyzing protocols, video, audio or system logs is time-consuming and tedious by hand but automatic analysis provides tools like EVA ( Experimental Video Annotator ) which is a system that runs on a multimedia workstation with a direct link to a video recorder to support the task. In Automated Analysis, the analyst has time to focus on relevant incidents and avoid excessive interruption of the task.

Advantages —

  • The analyst has time to focus on relevant incidents
  • Avoid excessive interruption of the task

Disadvantages —

  • Lack of freshness
  • Maybe post-hoc interpretation of events

5. Post-task walkthrough

In this method, the user reflects on the action after the event. This provides analyst time to focus on relevant incidents and avoid excessive interruption of the task. But this method lack freshness.

Query techniques

  1. Interviews

In this Method Analyst questions user on one to one basis with prepared questions about his experience with the design. Sometimes these questions are open ended which allow users to express their ideas and comments freely but sometimes respondents are asked to choose an answer from a fixed series of options given by the interviewer. (This form of interview is very similar in form to a closed questionnaire.) This type of structure yields information which is easily quantified, ensures comparability of questions across respondents and makes certain that the necessary topics are included. But this may prevent respondent expressing their true intentions. Therefore the interviewer must set some balance between making those questions.

This is a very suitable way to explore issues relatively in a cost-effective way. But the answers that are getting by the users can be very subjective sometimes.This method is informal, subjective and relatively cheap compared to other methods. But it is more time consuming than other methods.

Advantages —

  • Can be varied to suit the context
  • Issues can be explored more fully
  • Can elicit user views and identify unanticipated problems

Disadvantages —

  • Very subjective
  • Time-consuming

2. Questionnaires

In this method, users are given a set of fixed questions about what they prefer and what they think about the design. This method gives the chance to reach a big group of people in less time. But this is less flexible and less probing.

Those questions can be,

📝 general

📝 open-ended

📝 scalar

📝 multiple choice

📝 ranked

You might feel that this is not very flexible way like interviews. But in this method data that are collected can be analyzed more rigorously.

Advantages —

  • Quick and reaches large user group
  • Can be analyzed more rigorously

Disadvantages —

  • Less flexible
  • Less probing

Evaluation through monitoring physiological responses

Eye Tracking

In Eye-tracking method position of the eye is tracked through head or desk mounted equipment. Using that equipment the following measurements are taken and by analyzing those measurements the evaluation is conducted.

Fixations: eye maintains a stable position.

  • Number of fixations — The more fixations the less efficient the search strategy
  • Fixations duration — Indicate the level of difficulty with the display

Saccades: rapid eye movement from one point of interest to another
Scan paths: moving straight to a target with a short fixation at the target is optimal

Number of fixations : The more fixations the less efficient the search strategy

Fixations duration : Indicate level of difficulty with display

Scan paths : moving straight to a target with a short fixation at the target is optimal

Physiological Measurements

In this method users emotions and physical changes when using the user interface is observed and based on those data the evaluation is conducted.

Following are such changes observed in the process,

  • Heart activity, including blood pressure, volume and pulse.
  • The activity of sweat glands: Galvanic Skin Response (GSR)
  • Electrical activity in muscle: electromyogram (EMG)
  • Electrical activity in the brain: electroencephalogram (EEG)

I think you got an idea about evaluation techniques for interactive systems with this article.

Let’s meet with the next article. Thank you so much for reading!

References :

--

--

Gayan Malinda

Software Engineering Undergraduate - University of Kelaniya, Sri Lanka