Gesture elicitation as a computational optimization problem

The study

Bailly et al. (2013) investigated gestural shortcuts for their Métamorphe keyboard. Métamorphe is a keyboard with actuated keys that can sense user gestures, such as pull, twist, and push sideways. In this study, 20 participants suggested a keyboard shortcut for 42 referents on a Métamorphe mockup. Proposing a shortcut required choosing (i) a key and (ii) the gesture applied to the key. The participants produced a total of 71 unique keys, 27 unique gestures, and 358 unique combinations of keys and gestures. Our elicited sign set is the set of those 358 unique combinations (our signs).

The optimization problem

Our goal is to find an optimal set of mappings between signs (combinations of keys and gestures) and referents. We denote this optimal sign mapping set as \(\pi_{opt}\). Now, to define the optimization problem, we require:

An objective function. We aim to maximize guessability (Wobbrock et al. 2005), where guessability \(G(\pi)\) can be defined as the probability that the sign of a gesture performed by a random user for a random referent belongs to the sign mapping set \(\pi\).
A set of design constraints. We will require each referent to be mapped to a single and unique sign.

Ideally, we should find a sign mapping set that is optimal for the full population of users. In practice, however, we can only provide an estimate \(\widehat\pi_{opt}\) of the optimal mapping set based on a sample, i.e., the participants of the study. Since \(\widehat\pi_{opt}\) is not necessarily optimal, that is, \(G(\widehat\pi_{opt}) \le G(\pi_{opt})\), we would like to:

Quantify our uncertainty (or confidence) about \(\pi_{opt}\) given the current sample
Determine a sample size such that \(G(\widehat\pi_{opt})\) is close enough to \(G(\pi_{opt})\)

Read the dataset

This is the original dataset as provided by the authors.

As a first step, we need to read the data:

data <- read.csv("datasets/bailly2013.csv", stringsAsFactors=F)

# For each participant, there are five columns, where the first captures the key and the second captures the gesture  
keys <- data[, seq(2, ncol(data), by=5)] # These are participants' proposals of keys
gestures <- data[, seq(3, ncol(data), by=5)] # These are participants' proposals of key gestures

# Also build a table with keys + gestures combined.
keys_gestures = data.frame(keys)
for (r in 1:nrow(keys_gestures)) {
    keys_gestures[r,] = paste0(keys_gestures[r,], "-", gestures[r,])
}

# Replace the column names by participant IDs
pIDs = paste0("P", 1:ncol(keys))
names(keys_gestures) <- pIDs

# Replace the row names by referent IDs
row.names(keys_gestures) <- data$cmd

The resulting keys_gestures data frame is as follows (scroll to see details):

	P1	P2	P3	P4	P5	P6	P7	P8	P9	P10	P11	P12	P13	P14	P15	P16	P17	P18	P19	P20
Accept	Y-pull	Y-top	F9-towards	enter-top	enter-top	A-top	Y-top	enter-top	enter-towards	Y-top	A-pull	X-left	Y-top	esc-towards	enter-top	A-top	+-top	M-LR	A-top	Y-LR
Align bottom	A-towards	A-towards	shift-towards	A-towards	A-towards	B-towards	A-towards	A-top+towards	–towards	A-towards	B-towards	A-towards	A-towards	F6-towards	J-towards	A-towards	win-towards	J-towards	O-top	A-towards
Align justify	A-pull	A-LR	shift-pull	shiftR-LR	A-away	J-top	A-FB	J-top	\|-left+right(wiggle)	J-right	J-pull	J-push	A-top	F1-pull	J-pull	A-pull	+-LR	J-LR	J-right	A-pull
Align left	A-left	A-left	shift-left	A-left	A-left	A-left	A-left	A-top+left	[-left	A-left	L-left	A-left	A-left	F6-left	J-left	A-left	menu-left	J-left	[-left	A-left
Align middle	A-FB	A-FB	shift-top	A-FB	A-top	D-towards	A-LR	A-top	+-pull	A-FB	M-FB	A-FB	A-FB	F6-top	J-top	A-top	win-LR	J-top	%-top	A-FB
Align right	A-right	A-right	shift-right	A-right	A-right	A-right	A-right	A-top+right	]-right	A-right	R-right	A-right	A-right	F6-right	J-right	A-right	win-right	J-right	]-right	A-right
Align top	A-away	A-away	shift-away	A-away	A-away	T-top	A-away	A-top+away	–right	A-away	T-away	A-away	A-away	F6-away	J-away	A-away	win-away	J-away	O-pull	A-away
Close	X-top	O-pull	esc-LR	X-away	backspace-CCW	C-top	backspace-top	esc-top	F4-away	X-top	C-LR	C-top	O-towards	esc-top	F4-top	C-top	win-CCW	P-CCW	spacebar-top	C-top
Copy	C-top	C-top	C-towards	capslock-FB	C-away	C-towards	C-top	C-top	C-towards	C-LR	C-top	C-FB	C-top	C-top	C-top	C-LR	+-top-double	K-FB	C-top	C-top
Cut	C-away	X-top	X-top	capslock-LR	X-away	C-away	C-LR	X-top	C-away	C-pull	X-top	C-LR	X-top	backspace-LR	X-top	C-pull	–left	K-pull	C-away	X-top
Decrease volume	V-CCW	P-towards	spacebar-CCW	ctrlR-CCW	–away	V-towards	<-towards	spacebar-towards	7-CCW	V-CCW	V-towards	V-CCW	–top	–CCW	–top	–CCW	win-CCW	?-left	V-towards	V-CCW
Delete	D-pull	backspace-top	backspace-top	D-away	backspace-top	D-top	D-top	backspace-top	backspace-pull	D-top	D-pull	D-top	backspace-top	backspace-top	backspace-LR	backspace-top	–right	U-top	D-top	D-CCW
Duplicate	C-directional	D-directional	6-CW	2-LR	+-away	D-away	D-CW	C-top+directional	2-pull	D-top(double)	D-top	D-pull	D-top	V-FB	shiftL-top	D-LR	menu-top-double	U-right	D-pull	D-top
Enlarge	D-CW	^-away	+-pull	E-towards	+-pull	E-top	enter-right	+-away	+-CCW	L-pull	E-pull	E-left	S-CW	F12-pull	A-CW	+-right	+-CW	U-pull	O-CW	S-CW
Find	F-top	F-top	F-top	F-towards	F-away	F-top	F-top	F-top	F-FB	F-CCW	F-left	F-top	F-top	F-top	F-top	F-top	+-FB	L-top	F-top	F-top
Find next	F-right	F-right	tab-right	enter-right	>-CW	F-away	F-right	F-right	>-right	F-towards	F-towards	F-CW	F-left	F-CW	F-right	F-right	+-away	L-towards	F-right	F-right
Find previous	F-left	F-left	tab-left	backspace-CCW	<-CCW	P-right	backspace-left	F-left	<-left	F-away	F-away	F-CCW	F-right	F-CCW	F-left	F-left	win-left	L-away	F-left	F-left
Help	1-away	esc-top	H-pull	?-top	?-away	H-top	?-top	F1-top	menu-pull	H-pull	H-top	H-left	H-top	F5-FB	F9-top	menu-FB	+-pull	M-pull	H-pull	H-top
Increase volume	V-CW	P-away	spacebar-CW	ctrlR-CW	+-away	V-top	>-away	spacebar-away	7-CW	V-CW	V-away	V-CW	+-top	+-CW	+-top	+-CW	win-CW	?-right	V-away	V-CW
Insert	I-top	I-towards	shift-top	enter-top	I-away	I-top	P-LR	^-away	I-left+right(wiggle)	I-towards	I-top	I-left	I-top	I-FB	I-top	I-towards	–top-double	U-CW	I-top	I-FB
Maximize	M-away	tab-away	9-away	S-towards	+-CW	M-top	enter-right	M-away	F1-pull	D-pull	M-left	M-pull	+-CW	F9-CW	M-pull	+-top	+-CW	M-pull	spacebar-pull	M-pull
Menu access	win-top	esc-towards	menu-top	menu-top	M-away	Q-top	menu-LR	menu-top	menu-top	M-LR	M-top	M-top	menu-top	F5-pull	F10-top	menu-top	menu-FB	M-away	menu-top	M-CW
Minimize	M-towards	tab-towards	1-towards	M-away	–CCW	M-towards	enter-left	M-towards	F1-LR	D-top	M-left	M-CCW	–CCW	F9-CCW	M-top	–top	–CCW	M-top	spacebar-top	M-FB
Move a little	Q-left	N-directional	F1-directional	tab-LR	</>-left/right	L-towards	M-directional	ctrl-directional	tab-directional	B-directional	>-top	M-left/right	N-directional	F12-directional+LR	comma-directional	M-right	win-right	U-pull	O-left(pulse)/right(pulse)	J-directional
Move a lot	Q-away	M-directional	F12-directional	tab-FB	</>-pull	L-away	M-directional	ctrl-directional	capslock-directional	L-directional	>-pull	M-left/right	M-directional	F9-directional+LR	.-directional	M-LR	–right	U-FB	O-left(long)/right(long)	L-directional
Next	>-top	tab-right	N-right	2-right	>-top	N-top	>-right	spacebar-right	backspace-right	H-right	N-right	N-right	>-top	~-right	F-CW	N-right	win-right	L-right	>-right	N-left/right
Open	O-top	O-top	enter-top	enter-pull	O-away	O-top	O-top	enter-top	F4-towards	O-pull	O-top	O-top	O-away	O-pull	enter-top	O-pull	menu-CW	M-CW	spacebar-pull	O-top
Pan	G-left/right	P-directional	tab-left/right	spacebar-FB	Shift-left/right	P-left/right	M-directional	win-directional	shift-directional	S-directional	V-CW	O-left/right	V-directional	F6-directional+LR	?-directional	tab-directional	menu-FB	L-left	spacebar-left/right	P-CW/CCW
Paste	C-towards	V-towards	V-top	1-top	V-away	P-top	P-LR	V-top	+-towards	V-top	V-top	V-left	V-top	V-top	V-top	P-top	–top-double	K-LR	P-top	V-top
Pause	V-top	P-top	spacebar-top	>-top	P-towards	P-towards	spacebar-top	spacebar-top	doublequote-FB	P-top	P-pull	L-LR	spacebar-top	~-pull	spacebar-top	P-away	–top	?-push	=-towards	P-FB
Play	V-top	P-top	spacebar-top	>-right	P-away	Y-top	spacebar-top	spacebar-top	enter-LR/FB	P-LR	P-top	L-top	enter-top	~-top	spacebar-top	P-towards	+-top	?-CW	=-away	T-top
Previous	<-top	tab-left	P-left	1-left	<-top	X-top	backspace-left	spacebar-left	backspace-left	H-left	P-left	P-left	<-top	~-left	F-CCW	P-left	win-left	L-left	<-left	P-left
Reject	X-pull	N-top	F8-away	R-away	backspace-top	J-away	N-top	esc-top	X-towards	N-FB	R-pull	R-top	N-top	esc-away	backspace-top	R-top	–CCW	M-CCW	R-top	R-top
Rotate	R-CW/CCW	R-CW/CCW	menu-CW/CCW	R-FB	R-CW/CCW	R-top	R-CW/CCW	ctrl-CW/CCW	0-CW/CCW	R-CW/CCW	R-CW	R-CW/CCW	R-CW/CCW	F12-CW/CCW	^-CW/CCW	R-CW/CCW	win-FB+CW	U-CW/CCW	R-CW/CCW	R-CW/CCW
Save	S-top	S-pull	S-top	ctrl-top	S-away	S-top	S-top	S-top	S-FB	S-top	S-away	S-right	S-top	S-top	S-top	S-top	menu-top	P-left	S-top	S-top
Save all	S-FB	S-FB	F4-pull	ctrlL-pull	S-CW	L-top	S-LR	S-top+away	S-pull	A-top	S-left	S-top	S-pull	S-LR	S-pull	S-left	+-CW	P-top	S-LR	H-CW
Save as	S-pull	S-top	F5-top	ctrlL-pull	S-right	S-left	S-top	S-towards	S-towards	S-pull	S-right	S-left	S-right	S-FB	S-directional	S-right	win-top-double	P-right	F1-top	S-pull
Shrink	D-CCW	^-towards	–LR	S-away	–top	K-top	enter-left	–towards	+-CCW	S-FB	E-top	S-top	S-CCW	F12-top	A-CCW	–left	–CCW	U-CCW	O-CCW	S-CCW
Task switch	tab-left/right	tab-CW/CCW	win-towards	tab-pull	>-CW/CCW	W-top	T-left/right	tab-left/right	tab-CW/CCW	S-left	W-top	T-left	W-left/right	F5-LR	tab-top	win-right	menu-right	M-LR	spacebar-CW/CCW	T-towards
Undo	U-top	Z-top	backspace-CCW	shiftR-CCW	backspace-left	U-top	backspace-top	<-top	U-pull	Z-CCW	U-top	U-left	Z-top	backspace-away	Z-top	backspace-left	win-CCW	P-left	U-pull	U-CCW
Zoom in	Z-away	Z-away	+-top	Z-towards	Z-CW	Z-top	enter-right	win-top	Z-right	Z-pull	Z-CW	Z-top	Z-CW	F9-pull	Q-top	Z-CW	win-CW	L-CW	Z-CW	Z-FB
Zoom out	Z-towards	Z-towards	–pull	Z-away	Z-CCW	Z-away	enter-left	win-pull	Z-left	Z-top	Z-CCW	Z-pull	Z-CCW	F9-top	Q-pull	Z-CCW	win-CCW	L-CCW	Z-CCW	Z-pull

Estimate the optimal sign mapping set

We use a sign mapper that is based on the Hungarian algorithm to solve the optimization problem.

source("gelicopt/sign-mappers.R") # Implementation of various sign mappers

mappings <- hungarian_sign_mapper(keys_gestures)

The resulting mappings data frame is the following (scroll to see details):

refname	ref	sign
Accept	1	Y-top
Align bottom	2	A-towards
Align justify	3	A-pull
Align left	4	A-left
Align middle	5	A-FB
Align right	6	A-right
Align top	7	A-away
Close	8	esc-top
Copy	9	C-top
Cut	10	X-top
Decrease volume	11	V-CCW
Delete	12	backspace-top
Duplicate	13	D-top
Enlarge	14	+-pull
Find	15	F-top
Find next	16	F-right
Find previous	17	F-left
Help	18	H-top
Increase volume	19	V-CW
Insert	20	I-top
Maximize	21	M-pull
Menu access	22	menu-top
Minimize	23	–CCW
Move a little	24	N-directional
Move a lot	25	M-directional
Next	26	N-right
Open	27	O-top
Pan	28	L-left
Paste	29	V-top
Pause	30	spacebar-top
Play	31	P-top
Previous	32	P-left
Reject	33	R-top
Rotate	34	R-CW/CCW
Save	35	S-top
Save all	36	S-pull
Save as	37	S-right
Shrink	38	S-CCW
Task switch	39	tab-left/right
Undo	40	U-top
Zoom in	41	Z-CW
Zoom out	42	Z-CCW

As we explained earlier, these mappings (\(\widehat\pi_{opt}\)) are not necessarily optimal for the full user population. To quantify our confidence about the optimal mappings, we use the bootstrap method, where we iteratively run the optimization algorithm after sampling with replacement from the original sample of the 20 participants:

source("gelicopt/inference.R") # Implementation of bootstrapping used for inference

# Create the bootstrapping distribution. By default, it contains R = 200 samples. 
# But you could set R to a different number.
boot_samples <- bootstrap(keys_gestures, hungarian_sign_mapper, R = 300)

# For each referent, get only signs with at least 10% confidence
signs <- getSigns(boot_samples, conf.min = .1) 

# And we will then use it to flatten signs and confidence scores for presentation purposes
confidence_res <- data.frame(ref=row.names(keys_gestures), signConfidence = flatten(signs))

The resulting confidence_res data frame is the following (scroll to see details):

ref	signConfidence
Accept	enter-top:0.40, Y-top:0.38, A-top:0.14
Align bottom	A-towards:1
Align justify	A-pull:0.42, J-top:0.13, J-pull:0.12, J-right:0.11
Align left	A-left:1
Align middle	A-FB:0.88
Align right	A-right:1
Align top	A-away:1
Close	esc-top:0.27
Copy	C-top:0.91
Cut	X-top:0.79, C-away:0.17
Decrease volume	V-CCW:0.50, V-towards:0.22
Delete	backspace-top:0.77, D-top:0.21
Duplicate	D-top:0.45, D-pull:0.20
Enlarge	+-pull:0.27, S-CW:0.18
Find	F-top:1
Find next	F-right:0.96
Find previous	F-left:0.93
Help	H-top:0.46, H-pull:0.23, ?-top:0.12
Increase volume	V-CW:0.53, +-top:0.11
Insert	I-top:0.74, I-towards:0.17
Maximize	M-pull:0.47, +-CW:0.33
Menu access	menu-top:0.91
Minimize	–CCW:0.40, M-towards:0.29, M-top:0.13
Move a little	N-directional:0.19, win-right:0.13
Move a lot	M-directional:0.44, L-directional:0.17
Next	N-right:0.47, >-top:0.30, >-right:0.12
Open	O-top:0.87
Pan	menu-FB:0.15
Paste	V-top:0.93
Pause	spacebar-top:0.69, P-towards:0.12
Play	P-top:0.15, spacebar-top:0.14, +-top:0.12
Previous	P-left:0.63, <-top:0.20
Reject	R-top:0.52, N-top:0.33
Rotate	R-CW/CCW:0.99
Save	S-top:1
Save all	S-pull:0.30, S-LR:0.28, S-left:0.12, S-FB:0.10
Save as	S-right:0.50, S-pull:0.22, S-towards:0.12
Shrink	S-CCW:0.19
Task switch	tab-left/right:0.24, tab-CW/CCW:0.15, W-top:0.13
Undo	U-top:0.34, Z-top:0.24, backspace-left:0.17
Zoom in	Z-CW:0.69
Zoom out	Z-CCW:0.73

The value next to each sign (in a range from 0 to 1) expresses our confidence that the sign is the optimal one for the given referent, thus it belongs to \(\pi_{opt}\). Here, we don’t show signs for which the confidence score is lower than 10%.

Determine an adequate sample size

Bailly et al. (2013) recruited 20 participants, which is the most frequent sample size for gesture elicitation studies (Villarreal-Narvaez et al. 2020). But is this number a good choice? Would results be the same or similar if a larger sample was used? We will try to investigate this question.

Let us first evaluate the guessability of the optimal mappings \(\widehat\pi_{opt}\) that we found earlier:

source("gelicopt/guessability.R") # Implementation of guessability functions

guess <- guessability(mappings, keys_gestures)
cat("Guessability =", guess, "\n")

## Guessability = 0.2785714

Unfortunately, this guessability score \(\widehat G(\widehat\pi_{opt}) = 27.9\%\) has been evaluated on the actual sample on which it was previously optimized (trained), so there is a risk of overfitting.This training guessability value generally overestimates the true guessability \(G(\widehat\pi_{opt})\) that we would like to assess and compare to \(G(\pi_{opt})\). To evaluate this overfitting problem, we perform cross-validation using the Leave-One-Out Cross Validation (LOOCV) method:

guess <- guessability.loocv(keys_gestures, hungarian_sign_mapper)
cat("LOOCV Guessability =", guess, "\n")

## LOOCV Guessability = 0.222619

To better understand how the training and cross-validation guessability scores converge, we will calculate them for an increasing number of participants (\(n = 3, 4,... 20\)):

gevolution <- guessability.evolution(keys_gestures, hungarian_sign_mapper)

Based on these scores, we can also try to predict the evolution of the guessability error \(\epsilon_G = G(\pi_{opt}) - G(\widehat\pi_{opt})\). If this error is low enough, a larger sample size is not likely to result in meaningful improvements. We use a simple linear model that we train with data of simulated gesture elicitation studies (optimized with the Hungarian algorithm).

source("gelicopt/prediction.R") # Code for building and using a prediction model of guessability error

# We provide data from simulated studies that can be used to train our model
trainingDatasets <- c("training/synthetic-hungarian-1.csv", "training/synthetic-hungarian-2.csv")

# We will train a simple linear model with two predictors: the size of the sample and the difference between the training and cross-validation score 
lmodel <- buildModel(trainingDatasets) 

predicted <- predictError(lmodel, gevolution) 
gevolution$fit <- predicted[,"fit"] # Best guess
gevolution$lwr <- predicted[,"lwr"] # Lower bound of 95% prediction interval
gevolution$upr <- predicted[,"upr"] # Upper bound of 95% prediction interval

We can now plot the evolving training and cross-validation guessability scores, their difference, and the predicted guessability error:

library(ggplot2) # We will use ggplot2 to plot the training and cross-validation curves
library(gridExtra) # To plot the two graphs side by side

plot <- ggplot(gevolution, aes(x = N)) +
  geom_line(aes(y = validation*100), color = "orange") +
  geom_line(aes(y = training*100), color = "steelblue") +
  ylab("Guessability (%)") + xlab("Participants") +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) +
  annotate("text", x = 9, y = 33, label = "training curve", size = 3, hjust = 0) +
  annotate("text", x = 9, y = 17, label = "cross-validation curve", size = 3, hjust = 0) +
  scale_x_continuous(expand = c(0, 0), breaks = seq(0, 22, 2), limits = c(0, 22)) +
  scale_y_continuous(expand = c(0, 0), limits = c(0, 40))

plot_error <- ggplot(gevolution, aes(x = N)) +
  geom_line(aes(y = (training - validation)*100), color = "chartreuse4") +
  geom_line(aes(y = fit*100), color = "blue") +
  geom_line(linetype = 2, aes(y = lwr*100), color = "blue") +
  geom_line(linetype = 2, aes(y = upr*100), color = "blue") +
  ylab("Guessability Error (%)") + xlab("Participants") +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) +
  annotate("text", x = 9, y = 13, label = "training minus cross-validation curve", size = 3, hjust = 0) +
  annotate("text", x = 4, y = 9.5, label = "predicted error", size = 3, hjust = 0) +
  scale_x_continuous(expand = c(0, 0), breaks = seq(0, 22, 2), limits = c(0, 22)) +
  scale_y_continuous(expand = c(0, 0), limits = c(0, 25))

grid.arrange(plot, plot_error, ncol=2)

The predicted guessability error is expected to be lower than \(4.5\%\), with a best guess of around \(2.5\%\) – this number corresponds to a relative error of around \(10\%\) with respect to the expected population guessability. The error is decreasing slowly, but the cross-validation cross still fluctuates. A sensible decision could be to opt for a large sample size, e.g., \(n = 30\). However, the investigators may decide that the cost of recruiting additional participants outweighs any benefits from potential increases in guessability scores. In all cases, plotting the above graphs other researchers better interpret their results.

In the above analysis, we considered the full set of 42 referents. However, you might decide to concentrate on a smaller set of referents and adapt the optimization accordingly. Finally, if you interested in learning about how to conduct agreement analysis on the above dataset, please refer to this page.

References

Bailly, Gilles, Thomas Pietrzak, Jonathan Deber, and Daniel J. Wigdor. 2013. “Métamorphe: Augmenting Hotkey Usage with Actuated Keys.” In Proceedings of the Sigchi Conference on Human Factors in Computing Systems, 563–72. CHI ’13. New York, NY, USA: ACM. https://doi.org/10.1145/2470654.2470734.

Tsandilas, Theophanis, and Pierre Dragicevic. 2022. “Gesture Elicitation as a Computational Optimization Problem.” In CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3491102.3501942.

Villarreal-Narvaez, Santiago, Jean Vanderdonckt, Radu-Daniel Vatavu, and Jacob O. Wobbrock. 2020. “A Systematic Review of Gesture Elicitation Studies: What Can We Learn from 216 Studies?” In Proceedings of the 2020 Acm Designing Interactive Systems Conference, 855–72. DIS ’20. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3357236.3395511.

Wobbrock, Jacob O., Htet Htet Aung, Brandon Rothrock, and Brad A. Myers. 2005. “Maximizing the Guessability of Symbolic Input.” In CHI ’05 Extended Abstracts on Human Factors in Computing Systems, 1869–72. CHI Ea ’05. New York, NY, USA: ACM. https://doi.org/10.1145/1056808.1057043.