Experimental Design

EXPERIMENTAL DESIGN
What is it?
When to use it?
Types of Variables
Designing an Experiment
Case Study
Analyzing the data
Types of evaluation
Users not involved
Supported by practice/theory
Occurs in realistic setting

External validity: degree to which research
results applies to real situations
Large Sampling
Subjective/qualitative
1/22/2013 Comp 4020 - HCI 2 (PPI) 2

Done this someway
In one form or another we have resorted to
experimenting
Also an important tool for survival!

experimented with various types of ear plugs
experimented with different types of pacifiers
experimented with various types of snow tires
etc
But somewhat different, i.e. less formal
1/22/2013 Comp 4020 - HCI 2 (PPI) 3

1/22/2013 Comp 4020 - HCI 2 (PPI) 4
Approaches: Naturalistic
Naturalistic:
describes an ongoing process evolving over time
observation occurs in realistic setting
ecologically valid
real life
External validity
degree to which research results applies to real situations
1/22/2013 Comp 4020 - HCI 2 (PPI) 5

Approaches: Naturalistic
Advantage
Can state something about the users behavior in an
actual environment
Disadvantage
Cannot know all the contributing factors to users
performance
i.e. do they use menus more frequently than toolbar buttons
because the icons are not comprehensible OR because the
buttons are too small OR simply because they do not know
that they exist OR . [can go on]
1/22/2013 Comp 4020 - HCI 2 (PPI) 6

Approaches: Experimental
In certain cases you want to make a statement about a
particular UI design choice
i.e. I really want to know whether the size of buttons contribute
to how quickly users click on them
or
i.e. I want to know whether a menu designed in a circular shape
(pie menu) is more effective than a regular menu
or
Want to know the effect of certain variables on outcomes
You want to make some generic statements that can be

widely applicable (not only restrained to your app)
1/22/2013 Comp 4020 - HCI 2 (PPI) 7

Approaches: Experimental
Experimental
study relations by manipulating one or more independent
variables
experimenter controls all environmental factors
observe effect on one or more dependent variables
Internal validity
confidence that we have in our explanation of experimental
results
Trade-off: Natural vs. Experimental

What are some trade-offs?
1/22/2013 Comp 4020 - HCI 2 (PPI) 8

Quantitative Evaluation
What task to evaluate?
Depends on application
Attempt to find canonical task(s)
i.e. what would be a set of tasks that can be used to test
whether larger icons contribute to faster selection?
Common measures
Task completion time
Error rate
Learning rate (novice -> expert transition)
Fatigue, comfort?
etc.
1/22/2013 Comp 4020 - HCI 2 (PPI) 9

What task to evaluate?
Example: Pointing Device Evaluation
Real task: interacting with GUIs

pointing is fundamental
Experimental task: target acquisition

abstract, elementary, essential
W
D
1/22/2013 Comp 4020 - HCI 2 (PPI) 10

Example
Is it easier to read with CAPS or without Caps?
Want to make a conclusive and general statement

whether CAPS are more efficient than non-Caps
Conclusion would look like:

for text, CAPS are 20% less efficient than non-Caps or
for text, CAPS are 25% more efficient than non-Caps
1/22/2013 Comp 4020 - HCI 2 (PPI) 11

Example
How do we test this question?
Need to come up with

a hypothesis
a set of variables we are going to manipulate
a set of variables we are going to measure
reduce the number of confounding variables
a task
a set of randomized trials
1/22/2013 Comp 4020 - HCI 2 (PPI) 12

Example
THE BROWN FOX JUMPED OVER THE MOON.
OR, SHOULD IT SAY THE BROWN FOX
JUMPED OVER THE CAT.
1/22/2013 Comp 4020 - HCI 2 (PPI) 13

Example
The brown fox jumped over the

moon. Or, should it say the brown
fox jumped over the cat.
1/22/2013 Comp 4020 - HCI 2 (PPI) 14

Example
Would it be sufficient to simply show those two slides
and do some measurements?
What are some problems with this kind of setup?
What would we measure?
Lets first look at some definitions
1/22/2013 Comp 4020 - HCI 2 (PPI) 15

Hypothesis
Definition: Statement or claim that the
experimenter wants to test
Defines the nature of the relationship

between two types of variables
1/22/2013 Comp 4020 - HCI 2 (PPI) 16

Hypothesis
H0: there is no difference in the number of cavities in
children and teenagers using crest and no-teeth
toothpaste
H1: children and teenagers using crest toothpaste

have fewer cavities than those who use no-teeth
toothpaste
1/22/2013 Comp 4020 - HCI 2 (PPI) 17

Hypothesis
H0: there is no difference in user performance (time
and error rate) when selecting a single item from a
pop-up or a pull down menu, regardless of the
subjects previous expertise in using a mouse or
using the different menu types
File Edit View Insert File

New
New Edit
Open
Open View
Close
Close Insert
Save
Save
1/22/2013 Comp 4020 - HCI 2 (PPI) 18

Hypothesis
Hypothesis can be softer and uncertain:
Will color affect recognition speed?
Will proximity affect perceptual organization?
Etc
1/22/2013 Comp 4020 - HCI 2 (PPI) 19

Independent Variables
At least one circumstance is of major interest in an experiment
i.e. menu type in selection time experiment OR text type
Referred to as an independent variable

Independent of the subjects behavior or performance
Want to choose two or more levels of this circumstance to

present (manipulate)
Nothing the subject does can change the levels of the
independent variable
CAPS vs. non-caps
What are the independent variables in the toothpaste

experiment? What are the different levels?
1/22/2013 Comp 4020 - HCI 2 (PPI) 20

Dependent Variables
Want to measure a subjects behavior in response to
manipulations of the independent variable
Dependent variable, depends on what the subject does
Statement about the expected nature of the relationship

between the independent and dependent variables is
referred to as hypothesis (as seen previously)
1/22/2013 Comp 4020 - HCI 2 (PPI) 21

Control Variables
Only want to manipulate one circumstance
the independent variable
All other circumstances need to be controlled
These become control variables

control font of two different types of menus
control color coding on two different types of visualizations
Have to be controlled across all levels of the IV

confirm that change in dependent variable due to change in
However impossible to control everything

More control leads to less generalization
1/22/2013 Comp 4020 - HCI 2 (PPI) 22

Confounding Variables
A confounding variable is any factor that varies with the
Suppose we want to use 5 different levels for text type

subjects respond more quickly to the last 2
subjects respond more quickly after practice
Practice confounded with speed
Coke vs. Pepsi
1/22/2013 Comp 4020 - HCI 2 (PPI) 23

Random Variables
Want to avoid confounded effects; allow variables to
randomly vary: random variables
Selecting subjects is usually done randomly

For testing effect of color on visibility of an object
choose subjects randomly from a large population
choose colors to be tested on randomly as well
Age factors, eye deficiencies, and other elements would
randomly enter into the equation (can eliminate some of these)
Can flip a coin, throw dice, allow a random number generator

to select for us
1/22/2013 Comp 4020 - HCI 2 (PPI) 24

Example
In the previous example what may be a hypothesis
H1: Users are slower reading CAPS
H2: There is no difference in reading rates
H3: CAPS are less memorable
What variables do we manipulate, i.e. what are the

independent variables?
Text type, i.e. CAPS or no Caps (Two levels)
What variables do we measure, i.e. what are the dependent

variables?
Lets look first at the hypothesis
H1 or H2: reading speed
H3: recall after 2 hours
1/22/2013 Comp 4020 - HCI 2 (PPI) 25

Example
What variables do we control?
What may be some confounding variables and

how do we counter these?
More on this next
1/22/2013 Comp 4020 - HCI 2 (PPI) 26

Experimental Design
Manipulating and Measuring Variables
Within vs. Between Subjects Design
Single vs. Multiple Variable Experiment
1/22/2013 Comp 4020 - HCI 2 (PPI) 27

Choosing an Independent Variable
Should be what the experimenter wants to manipulate:
Font 10 vs. 12 vs. 14 (IV=font size)
Bar graph vs. line graph (IV=type of graph)
Are children more violent after being exposed to games with
violence. What is the IV?
In the last question need to define violence, i.e. what is the

operational definition of violence in games?
Is there shooting/hurting/physical contact?
Are the actions moral/immoral (stealing, deceiving, etc.)?
Language abuse?
Would it be considered violent if outside the game?
1/22/2013 Comp 4020 - HCI 2 (PPI) 28

Single Variable Experiment
Only one independent variable
Two-level experiment: the IV has two levels (simplest case,

where one is the experimental group and the other control
group), i.e. existence vs. non-existence
Advantages:
Way of finding out if IV is worth studying
Results easy to interpret and analyze
Some cases do not need more than two levels
investigating two interaction techniques
two educational methods
etc.
1/22/2013 Comp 4020 - HCI 2 (PPI) 29

Disadvantages:
Sometimes does not say much about the relationship
between the IV and the DV
Reading Time
Reading Time
12 10 12 10
Print Size Print Size
Reading Time 12 10
Print Size
1/22/2013 Comp 4020 - HCI 2 (PPI) 30
Multilevel Experiments: single variable experiments where IV has > 2
levels
Average Test Score
Average Test Score

Low High Low Neutral High
Anxiety Level Anxiety Level
Advantages:
Have better handle over IV-DV relationship
The more levels added the less critical is the range of IV (balance
between realistic and large enough)
Disadvantages:
Requires more time and effort than 2-level (within-subjects increases time
for each subject, between-subjects requires additional subjects)
Statistical tests more complex
Need to know when to limit the number of levels
1/22/2013 Comp 4020 - HCI 2 (PPI) 31
Multiple Variable Experiment
Most frequent design combines several variables in a factorial
combination that pairs each level of IV with the others
referred to as a factorial design
2 levels for Caps/no-caps and 3 levels for font size

(small/medium/large)
Gives 2 x 3 design
Font Size
Small Medium Large
Yes
Caps
No
1/22/2013 Comp 4020 - HCI 2 (PPI) 32

Multiple Variable Experiment
Advantages
Interactions between IVs can be studied (interaction occurs
when the relationship between one IV and subjects behavior
depends on the level of a second IV)
Can add additional circumstances by making them IVs
When circumstance that could add variability to the data is
made into a factor, the amount of variability decreases
Disadvantages
Time-consuming and costly
Analysis more complicated, need to typically do an ANOVA
Assumption that variability in data approximates a normal
distribution (dont know until completed experiment)
Interpretation of results is more complex
1/22/2013 Comp 4020 - HCI 2 (PPI) 33

Range of the Independent Variable
Range is the difference between the highest and lowest level
of a variable; no specific guidelines, need to fit it in the
experiment
Realistic range: do not choose levels that are so wide that

effects will definitely be found without carrying out the
experiment
Range that shows effect: should be large enough to have an

effect
If interested in effect of font size on reading speed choosing
between font 14 vs. font 15 will could lead to false conclusions
Pilot experiment: similar to real experiment but data thrown

out; can test design before proceeding
1/22/2013 Comp 4020 - HCI 2 (PPI) 34

Choosing a Dependent Variable
Measure of the subjects behavior
Need operational definition; i.e. do violent games result in

childrens aggression?
How do we measure aggressiveness?

Panel of judges observing playing behavior + rating
Give a selection of toys and observe how they play
Narrate frustrating stories and count number of direct-attacks
In HCI it can be a bit more straightforward fortunately
But need to also define validity and reliability of the

measurements
1/22/2013 Comp 4020 - HCI 2 (PPI) 35

Reliability/Repeatability
Would the same results be achieved if the test were
repeated?
Experiment is perfectly reliable if you get same results each time
experiment is repeated
Problems
Individual differences:
best user 10x faster than slowest
best 25% of users ~2x faster than slowest 25%
Unreliable instruments
e.g., built in clock vs. stop watch
Partial Solution
Reasonable number and range of users tested
Correlate data from repeated measurements
1/22/2013 Comp 4020 - HCI 2 (PPI) 36

Validity
Are you measuring what you think youre measuring?
Errors in equipment
Errors in procedure
Incorrect pool of subjects
Errors questions asked, variables measured
1/22/2013 Comp 4020 - HCI 2 (PPI) 37

Observable Dependent Variables
Directly observable DVs can be measured directly; indirect
DVs use secondary measures
i.e. physiological measures with a lie detector
response time to measure how much info. is processed
Single dependent variable: measuring only accuracy or

speed; usually not sufficiently indicative of performance
i.e. could be very fast but also very inaccurate
Multiple dependent variable: speed-accuracy tradeoffs for

example gives an overall better indication of performance
i.e. more valid
Composite dependent variable: multiple dependent variables

combined to form one variable
1/22/2013 Comp 4020 - HCI 2 (PPI) 38

Questions?
1/22/2013 Comp 4020 - HCI 2 (PPI) 39

Experimental Design
Individual differences
Need more than one subject
Usually multiple subjects (n=at least 10, ideally much more)
how to distribute tasks amongst subjects?
1/22/2013 Comp 4020 - HCI 2 (PPI) 40

Within vs. Between Subjects Design
Within subject design:
Pros: Condition 1 Condition 2
All subjects do all conditions
Fewer subjects, less individual differences Subject 1 Subject 1
Easier stats analysis Subject 2 Subject 2
Cons:
Transfer effects . .
Doing 1 condition affects following condition
Subject 10 Subject 10
Often you want subjects to learn extensively
Between subjects design:

Pros:
Subjects only do one condition Condition 1 Condition 2
No transfer effects
Train to high skill
Cons: Subject 2 Subject 12
More subjects, individual differences
. .
Harder stats analysis
1/22/2013 Comp 4020 - HCI 2 (PPI) 41

Experimental Design
Order of presentation in within-subjects designs
ABBA counterbalancing:
Every subject does trials in the order: A, B, B, A
Any confounding effect (e.g., learning curve) is counterbalanced
Trial# 1 2 3 4
Condition A B B A
Linear Confounding effect 10 20 30 40
Resulting Confound: A: 10+40 = 50

B: 20+30 = 50
Nonlinear confounding effect 5 30 50 60
Resulting Confound: A: 5+60 =65

B: 30+50 = 80
1/22/2013 Comp 4020 - HCI 2 (PPI) 42

Experimental Design
Make order a between-subjects variable
Fully counterbalanced:
ABC
ACB
AB
BAC
BA
BCA
CAB
CBA
Combinatorial explosion when n>4
Needs lots of subjects
1/22/2013 Comp 4020 - HCI 2 (PPI) 43

Experimental Design
Partial counterbalancing. e.g., Latin square:
Ensures each level appears in every position in order equally
often
n rows x n columns and each treatment occurs once in each
row and in each column
ABC
BCA
CAB
Balanced Latin Square:

Each condition precedes and follows each of the other
conditions equally often:
ABCD
BDAC
D C BA
1/22/2013 Comp 4020 - HCI 2 (PPI)
CAD B 44
Experimental Design
Why counterbalance?
Reduce transfer effects
Assumes symmetric transfer

A-B transfer == B-A transfer
If asymmetric transfer
i.e., A-B transfer > or < B-A transfer then use a between-
subjects design
Range effects
People tend to perform best in middle of range of trials
does between-subjects design solve this?
Context effect when one level of IV is used subjects establish a
context
1/22/2013 Comp 4020 - HCI 2 (PPI) 45

Activity
How would you carry out the experiment for
comparing CAPS to non-caps, i.e. what would be
your design?
1/22/2013 Comp 4020 - HCI 2 (PPI) 46

Activity
Design an experiment to compare a pop-up linear
menu vs. a pie menu
Subjects? Day Shift

Hypothesis? Evening Shift
IV? Night Shift
Split Shift
DV?
Design?
Evening
Task (s)?
Day Night
Split
1/22/2013 Comp 4020 - HCI 2 (PPI) 47

Activity
1/22/2013 Comp 4020 - HCI 2 (PPI) 48

Activity
Design an experiment to test whether adding color
coding to a menu interface improves accuracy?
Subjects?
Hypothesis?
IV?
DV?
Design?
Task (s)?
1/22/2013 Comp 4020 - HCI 2 (PPI) 49

Activity
Only one form of solution, many others exist
Subjects: Taken from user population
Hypothesis: Color coding will make selection more accurate
IV: Color coding
DV: Accuracy measured as number of errors
Design: between groups to ensure no transfer of learning (or
within groups with appropriate safeguards if subjects are scarce)
Task: the interfaces are identical in each of the conditions,
except that, in the second color is added to indicate related
menu items. Subjects are presented with a screen of menu
choices (ordered randomly) and verbally told what they have to
select. Selection must be done within a strict time limit when the
screen clears. Failure to select the correct item is deemed an
error. Each presentation places items in new positions. Subjects
perform in one of two conditions.
1/22/2013 Comp 4020 - HCI 2 (PPI) 50

Example
The Effect of Shading in Extracting Structure
from
Space-Filling Visualizations
July 14-16, 2004

Motivation
Hierarchies are abundant and interacted with on a
regular basis
For adequate navigation, the structure has to be

explicit
Hierarchies are generally represented as trees
Structure is explicit, but space-inefficient &

navigation complexity increases with size
1/22/2013 Comp 4020 - HCI 2 (PPI) 52

Space-Filling Visualization
Developed to make more efficient use of display
space
i.e.: Treemap [Shneiderman, 1990]
Characterized by compactness and effectiveness

of showing node size
However, the structure is no longer explicit
Can shading facilitate the extraction of structure

information?
1/22/2013 Comp 4020 - HCI 2 (PPI) 53

CushionMap: Shaded Treemap
CushionMap (SequoiaView) uses shading to give

a 2-D impression, to make structure more explicit
[van Wijk, 1999]
1/22/2013 Comp 4020 - HCI 2 (PPI) 54

Structure-from-Shading (1)
Evidence that our visual system extracts shading

information early on
Simple shading information processed preattentively

[Enns & Rensink, 1990]
1/22/2013 Comp 4020 - HCI 2 (PPI) 55

Shading and contour combine to strongly influence

the shape of an object [Sun and Perona, 1996]
We innately make assumptions about shading

information [Ramachandran, 1988]
1/22/2013 Comp 4020 - HCI 2 (PPI) 56

Shading useful in extracting structure information in
node-link diagrams [Irani and Ware, 2001]
1/22/2013 Comp 4020 - HCI 2 (PPI) 57

Some evidence that shading impairs size judgments
2D bar/pie charts better than 3D counterpart [Carswell

et al, 1991]
Similarly 2D line graphs lower accuracy than 3D

counterpart [Zacks et al, 1998]
1/22/2013 Comp 4020 - HCI 2 (PPI) 58

Study Methodology
Hypotheses
Participants
Apparatus and task
Experimental factors
Study Design
1/22/2013 Comp 4020 - HCI 2 (PPI) 59

Experiment - Hypotheses
Hypothesis 1: shading (CM) will result in higher performance on
structure related tasks than the no-shading condition (TM)
Hypothesis 2: shading (CM) will result in lower performance on

tasks related to file and directory size comparisons than the
no-shading condition (TM)
1/22/2013 Comp 4020 - HCI 2 (PPI) 60

Participants
20 undergraduate students (paid) participated
Random assignment to one of two conditions CM or

TM first
All familiar with concept of file and directory

management tasks/routines
None had experience with SequoiaView
1/22/2013 Comp 4020 - HCI 2 (PPI) 61

Experiment Method
Half started on TreeMap (TM) the other half on
CushionMap (CM)
Used 2 different hierarchies H1 and H2
{CM-H1, TM-H2}, {CM-H2, TM-H1}, {TM-H1, CM-H2},

and {TM-H2, CM-H1}.
1/22/2013 Comp 4020 - HCI 2 (PPI) 62

Experiment Tasks
Tasks divided into two major categories:
Structure-based
Count the number of directories in the hierarchy
Find the directory with the most number of files
Count the number of subdirectories in a given directory
Count the number of files in a given subdirectory
Find the directory with the most number of bit map files (.bmp)
Count the number of sub-directories that contain bitmap
(.bmp) files
Size-based
Find the smallest directory in the hierarchy
Find the largest file in the hierarchy
Find the largest file in a given directory
Find the largest mp3 file in the hierarchy
1/22/2013 Comp 4020 - HCI 2 (PPI) 63

Experiment Measurements
Measure: subjects performance on each task with
respect to two variables:
time until completion (0 to 45 seconds)
successful/unsuccessful completion (0/1)
Timeouts classified as failures
Unsuccessful and timeouts not included in average

completion time calculations
1/22/2013 Comp 4020 - HCI 2 (PPI) 64

Experiment Results (2)
Structure Size
Average Completion Time TM = 21.5 (6.1) TM = 17.9 (5.4)

(seconds)
CM = 16.2 (3.7) CM = 20.2 (5.4)
Average # of tasks TM = 2.7 (1.5) TM = 3.4 (0.7)
successfully completed
CM = 4.9 (0.8) CM = 3.1 (0.9)
25 6
5
20
4
15
TM TM
3
CM CM
10
2
5
1
0 0
St ruct ure Size St r uc t ur e Si z e
Completion Time # of Tasks Successfully Completed

1/22/2013 Comp 4020 - HCI 2 (PPI) 65
Experiment Results (3)
Structure Size
Completion Time CM significantly faster that No significant difference

TM (p=0.0021) between CM and TM
Completion Subjects significantly more No significant difference

Success accurate on CM over TM between CM and TM
(p<0.001)
1/22/2013 Comp 4020 - HCI 2 (PPI) 66

Experiment Subjective Evaluation
Statement TM CM
1. I was able to count the number of directories using toolname. 3.65 4.40
2. I was able to find the bitmap (.bmp) files using toolname. 3.70 4.60
3. I was able to detect the type of files using toolname. 3.95 4.55
4. I was able to find subdirectories using toolname. 3.60 4.35
5. I was able to find the files inside a sub-directory using toolname. 3.05 3.95
6. I was able to find the largest file using toolname. 3.50 3.95
7. I was able to compare the sizes of files using toolname. 3.30 3.90
8. I was able to find the largest directory using toolname. 3.70 4.40
9. After the training session I knew how to use toolname. 4.00 4.35
10. I found toolname confusing to use. 3.05 2.05
5 =strongly agree , 1 = strongly disagree

1/22/2013 Comp 4020 - HCI 2 (PPI) 67
Discussion
Level of Support for Tasks Based on Size
Very High
?
n9
High
n5
n1
n10
Sunburst ?
n7 n8
Medium
n0
Low
n1 n2 n3
n4 n5 n6
n7 n8 n9 n10
Low Medium High Very High
Level of Support for Tasks Based on Structure

1/22/2013 Comp 4020 - HCI 2 (PPI) 68
Discussion
Tested the effect of shading on non-explicit structures (CM vs.
TM)
Confirmed the first hypothesis

Users were faster and more accurate in completing directory
management tasks with the shaded hierarchies
Did not obtain any conclusive results on the unfavorable

effect of shading for size-based tasks
Need to investigate the ability of users to extract structure from

space-filling techniques
1/22/2013 Comp 4020 - HCI 2 (PPI) 69

Recap
Choosing IVs and DVs
Range of IVs
Determining reliability and validity
Within-subjects & between-subjects design
Single variable vs. multi-variable designs
1/22/2013 Comp 4020 - HCI 2 (PPI) 70

Questions?
1/22/2013 Comp 4020 - HCI 2 (PPI) 71

Interpreting Experimental Results
Plotting Frequency Distributions
Statistics for Describing Distributions
Plotting Relationships Between Variables
Describing the Strength of a Relationship
Interpreting Results from Factorial Experiments
Inferential Statistics
1/22/2013 Comp 4020 - HCI 2 (PPI) 72

Statistical analysis
Calculations that tell us
mathematical attributes about our data sets
mean, amount of variance, ...
how data sets relate to each other

whether we are sampling from the same or different distributions
the probability that our claims are correct

statistical significance
1/22/2013 Comp 4020 - HCI 2 (PPI) 73

Questions one might ask
Is there a difference?
Is one system better than another?
Techniques addressing this are called hypothesis testing
The answers are not simply yes/no, but of the form: we are 99% certain
that selection on 5 item menus is faster than 7 item menus
How big is the difference?

i.e. selection from 5 items is 270 ms faster than from 7 items
Called point estimation, often obtained by averages
How accurate is the estimate?

i.e. selection is faster by 270 +/- 30 ms
Answers to this are in the form of standard deviations or
confidence intervals
we are 95% certain that the difference in response time is
between 240 and 310 ms
1/22/2013 Comp 4020 - HCI 2 (PPI) 74

Interpreting Results
First two rules:
Look at the data
a graph, histogram or table of results could be more instructive
Exposes outliers, which need to be removed to avoid biases
Save the data
May want to try different analyses on the data
Trace back the analysis to the raw data collected
Choice of statistical analysis depends on type of data and

questions to be answered
1/22/2013 Comp 4020 - HCI 2 (PPI) 75

Plot a frequency distribution telling us how frequently each
score appears in the data
Frequency is the number of raw data points that fall into each
score category
Useful first step in finding out whether there is a difference

between conditions
Example: two groups

Want to determine whether video game player who plays racing
games is more comfortable (less anxious) with fast drivers
1/22/2013 Comp 4020 - HCI 2 (PPI) 76

Game Player Non-Player 3.5
1 62 11 55 3
2 56 12 42 2.5
3 67 13 61 2
4 91 14 58
1.5
5 53 15 70 0.5
6 87 16 47 0
0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99
7 51 17 62
8 63 18 36 Game Player
9 46 19 74
10 71 20 51 3.5
2.5
By looking at distributions we can 2
notice that there are no differences 1.5
0.5
0
10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99
1/22/2013 Comp 4020 - HCI 2 (PPI) Non-Player 77

Normal distribution, fits a complex mathematical formula. For
our purposes, dist is normal if fits a bell-shaped curve
Important to know whether distribution is normal so that you

can apply appropriate statistical tests
Could also have bimodal, truncated or skewed distributions
Although nice to see frequency distribution, nice to have a

single number representing how subjects performed
1/22/2013 Comp 4020 - HCI 2 (PPI) 78

Use typically two types of statistics: descriptive and inferential
Descriptive statistic is simply a number that allows the

experimenter to describe some characteristics
Inferential will be discussed later
1/22/2013 Comp 4020 - HCI 2 (PPI) 79

One important descriptor is the location of the middle of a
distribution (central tendency)
Mode, the most frequently occurring score
Median, its the middle score, equal number of scores above it

and below it
Mean, weighted average of the scores
Which to use depends on the distribution, what purpose the

average plays, and your judgment
outliers vs. no outliers
1/22/2013 Comp 4020 - HCI 2 (PPI) 80

Another important statistic is the measure of dispersion, or how
spread out the scores are
Range, difference between largest and smallest value
Variance, calculated by computing deviation of each score

from the mean, squaring these, adding them up, and dividing
by number of scores
Std deviation, simply the square root of the variance
The smaller the std dev, indicates that mean is with fewer
errors
1/22/2013 Comp 4020 - HCI 2 (PPI) 81

Reason for experiment is to determine if there is a relationship
between IV and DV
Find it useful to draw a graph to represent the experimental

relationship
Plot DV on y-axis and IV on x-axis
What types of graphs to use:

If IV levels cannot be represented by numbers use bar graphs
If IV is continuous use histogram or line graph
1/22/2013 Comp 4020 - HCI 2 (PPI) 82

70
70
60
60
50
50
40
40
30
30
20
20
10
10
0
0
1 2 3 4 5
P NP
Bar Graph showing mean Line graph showing mean

comfort scores for players (P) comfort scores for players after
and non-players (NP) several months of gaming
1/22/2013 Comp 4020 - HCI 2 (PPI) 83

Strength of a Relationship
The previous graphs were functions of a descriptive statistic
rather than that of individual points
Rarely will every data point fall on a smooth function
If you use raw data will very likely find some variability or
spread a scatter plot
1/22/2013 Comp 4020 - HCI 2 (PPI) 84

Scatterplots
+.87 - 1.0
1/22/2013 Comp 4020 - HCI 2 (PPI) 85

Correlation:
Measures the extent to which two concepts are related
e.g. years of university training vs. computer ownership per
capita
How?
obtain the two sets of measurements
calculate correlation coefficient
+1: positively correlated
0: no correlation (no relation)
1: negatively correlated
1/22/2013 Comp 4020 - HCI 2 (PPI) 86

10
r2 = .668
condition 1 condition 2 9
5 6
4 5
Salary per year (*10,000)

8
6 7
4 4
5 6 7
3 5
5 7
4 4 6
5 7
6 7
6 6 5
7 7
6 8
7 9 4
3
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
Pickles eaten per month
1/22/2013 Comp 4020 - HCI 2 (PPI) 87

Correlation
10
Pickles eaten Salary per year r2 = .668

per month (*10,000) 9
5 6
4 5 8
6 7
Salary per year (*10,000)

4 4
5 6 7
3 5
5 7
4 4 6
5 7
6 7
6 6 5
7 7
6 8
7 9 4
3
Which conclusion could be correct? 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
- Eating pickles causes your salary to increase
Pickles eaten per month
- Making more money causes you to eat more pickles
- Pickle consumption predicts higher salaries because
older people tend to like pickles better than younger
people, and older people tend to make more money than
younger people
1/22/2013 Comp 4020 - HCI 2 (PPI) 88
Correlation
Dangers
attributing causality
a correlation does not imply cause and effect
cause may be due to a third hidden variable related to
both other variables
drawing strong conclusion from small numbers

unreliable with small groups
be weary of accepting anything more than the direction of
correlation unless you have at least 40 subjects
1/22/2013 Comp 4020 - HCI 2 (PPI) 89

Correlation
Cigarette Consumption
Crude Male death rate for

lung cancer in 1950 per capita
consumption of cigarettes in
1930 in various countries.
While strong correlation (.73),

can you prove that cigarette
smoking causes death from this
data?
Possible hidden variables:

age
poverty
1/22/2013 Comp 4020 - HCI 2 (PPI) 90
Regression
Calculates a line of best fit
Use the value of one variable to predict the value of the other
e.g., 60% of people with 3 years of university own a computer
10
y = .988x + 1.132, r2 = .668
9
condition 1 condition 2
5 6 8
4 5
6 7 Condition 2
4 4 7
5 6
3 5
5 7 6
4 4
5 7
5
6 7
6 6
7 7 4
6 8
7 9
3
3 4 5 6 7
1/22/2013 Comp 4020 - HCI 2 (PPI) Condition 1 91
Example:
time it takes subjects to read paragraphs typed in 12-point or 10-
point print
8-year olds in one group, 12-year olds in another group
Cannot simply ask whether the independent variable has had

an effect on the dependent variable
Must ask more specifically:

Is there an effect of print size? (main effect)
Is there an effect of age? (main effect)
Does the effect of one variable depend on the level of the other?
(interaction)
1/22/2013 Comp 4020 - HCI 2 (PPI) 92

Main Effects
To evaluate main effects of an IV must average across levels of
the other variable
To determine effect of print size we need to find a point halfway

between the two levels of age at each level of print size
We observe a change in print size (10-point to 12-point) causes a
change in DV (time) yes, there is main effect of print size
To determine effect of age we need to find a point halfway

between the two levels of print size at each level of age
We observe that a change in age (increase) causes a change in DV
(time decreases) yes, there is a main effect of age
1/22/2013 Comp 4020 - HCI 2 (PPI) 93

Main effect of print size?
40
Reading Time
30
Time
20
yes
10
10 12 10 12
Age
8 years Main Time
12 years effect yes
of age?
10 12
1/22/2013 Comp 4020 - HCI 2 (PPI) 94
Print Size
Interactions
To determine whether the IVs interact we must ask:
is the effect of print size different for each age? (or)
is the effect of age different for each print size?
1st question:
we see that going from 10-point to 12-point causes a decrease in
reading time for 8-year old but no diff for 12-year old
2nd question:
we see that the difference between reading times for the two
ages is larger for 10-point than for 12-point
1/22/2013 Comp 4020 - HCI 2 (PPI) 95

Interaction?
Time
Time
10 12 10 12
Age
8 years yes
12 years
1/22/2013 Comp 4020 - HCI 2 (PPI) 96

Activity
Time
Time
10 12 10 12
Print size? No
Age? Yes Print size? Yes
Interaction? No Age? Yes
Interaction? No
Age
8 years
12 years
1/22/2013 Comp 4020 - HCI 2 (PPI) 97

Inferential Statistics
In many experiments testing one design against
another
i.e. the independent variable is usually discrete
Can have discrete variables or continuous variables

Discrete take on finite number of values (screen color)
Continuous take on any value (persons height, time to
complete task)
Special case when continuous variable is positive (response
time cannot be < 0)
1/22/2013 Comp 4020 - HCI 2 (PPI) 98

Choosing a Statistical Technique
Independent Dependent
Variable Variable
Parametric
Two-valued Normal Students t-test on difference of means

Discrete Normal ANOVA (ANalysis Of VAriance)
Continuous Normal Linear (non-linear) regression factor analysis
Non-parametric
Two-valued Continuous Wilcoxon (Mann-Whitney) rank-sum test

Discrete Continuous Rank-sum versions of ANOVA
Continuous Continuous Spearmans rank correlation
1/22/2013 Comp 4020 - HCI 2 (PPI) 99

Questions?
1/22/2013 Comp 4020 - HCI 2 (PPI) 100

Experimental Design

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Experimental Design

Uploaded by

Copyright:

Available Formats

EXPERIMENTAL DESIGN

Occurs in realistic setting

1/22/2013 Comp 4020 - HCI 2 (PPI) 2

Also an important tool for survival!

But somewhat different, i.e. less formal

1/22/2013 Comp 4020 - HCI 2 (PPI) 3

1/22/2013 Comp 4020 - HCI 2 (PPI) 5

1/22/2013 Comp 4020 - HCI 2 (PPI) 6

You want to make some generic statements that can be

1/22/2013 Comp 4020 - HCI 2 (PPI) 7

Trade-off: Natural vs. Experimental

1/22/2013 Comp 4020 - HCI 2 (PPI) 8

1/22/2013 Comp 4020 - HCI 2 (PPI) 9

Real task: interacting with GUIs

Experimental task: target acquisition

1/22/2013 Comp 4020 - HCI 2 (PPI) 10

Want to make a conclusive and general statement

Conclusion would look like:

1/22/2013 Comp 4020 - HCI 2 (PPI) 11

Need to come up with

1/22/2013 Comp 4020 - HCI 2 (PPI) 12

1/22/2013 Comp 4020 - HCI 2 (PPI) 13

The brown fox jumped over the

1/22/2013 Comp 4020 - HCI 2 (PPI) 14

What are some problems with this kind of setup?

What would we measure?

Lets first look at some definitions

1/22/2013 Comp 4020 - HCI 2 (PPI) 15

Defines the nature of the relationship

1/22/2013 Comp 4020 - HCI 2 (PPI) 16

H1: children and teenagers using crest toothpaste

1/22/2013 Comp 4020 - HCI 2 (PPI) 17

File Edit View Insert File

1/22/2013 Comp 4020 - HCI 2 (PPI) 18

1/22/2013 Comp 4020 - HCI 2 (PPI) 19

Referred to as an independent variable

Want to choose two or more levels of this circumstance to

What are the independent variables in the toothpaste

1/22/2013 Comp 4020 - HCI 2 (PPI) 20

Dependent variable, depends on what the subject does

Statement about the expected nature of the relationship

1/22/2013 Comp 4020 - HCI 2 (PPI) 21

All other circumstances need to be controlled

These become control variables

Have to be controlled across all levels of the IV

However impossible to control everything

1/22/2013 Comp 4020 - HCI 2 (PPI) 22

Suppose we want to use 5 different levels for text type

Coke vs. Pepsi

1/22/2013 Comp 4020 - HCI 2 (PPI) 23

Selecting subjects is usually done randomly

Can flip a coin, throw dice, allow a random number generator

1/22/2013 Comp 4020 - HCI 2 (PPI) 24

What variables do we manipulate, i.e. what are the

What variables do we measure, i.e. what are the dependent

1/22/2013 Comp 4020 - HCI 2 (PPI) 25

What may be some confounding variables and

More on this next

1/22/2013 Comp 4020 - HCI 2 (PPI) 26

1/22/2013 Comp 4020 - HCI 2 (PPI) 27

In the last question need to define violence, i.e. what is the

1/22/2013 Comp 4020 - HCI 2 (PPI) 28

Two-level experiment: the IV has two levels (simplest case,

1/22/2013 Comp 4020 - HCI 2 (PPI) 29