Power and Sample Size Simulation: Reaction Time Example (raw RT)
Christopher L. Aberson
2024-11-10
heyman_vignette.Rmd
Project/Data Title:
Continuous lexical decision task: classification of Dutch words as either actual words or nonwords
Data provided by: Tom Heyman
Project/Data Description:
Data come from a study reported in Heyman, De Deyne, Hutchison, & Storms (2015, Behavior Research Methods; henceforth HDHS). More specifically, the study involved a continuous lexical decision task intended to measure (item-level) semantic priming effects (i.e., Experiment 3 of HDHS). It is similar to the SPAML set-up (see https://osf.io/q4fjy/), but with fewer items and participants. The study had several goals, but principally we wanted to examine how a different/new paradigm called the speeded word fragment completion task would compare against a more common, well-established paradigm like lexical decision in terms of semantic priming (i.e., magnitude of the effect, reliability of item-level priming, cross-task correlation of item-level priming effects, etc.). Experiment 3 only involved a continuous lexical decision task, so the datafile contains no data from the speeded word fragment completion task.
Methods Description:
Participants were 40 students from the University of Leuven, Belgium (10 men, 30 women, mean age 20 years). A total of 576 pairs were used in a continuous lexical decision task (so participants do not perceive them as pairs): 144 word–word pairs, 144 word–pseudoword pairs, 144 pseudoword–word pairs, and 144 pseudoword–pseudoword pairs. Of the 144 word-word pairs, 72 were fillers and 72 were critical pairs, half of which were related, the other half unrelated (this was counterbalanced across participants). The dataset only contains data for the critical pairs. Participants were informed that they would see a letter string on each trial and that they had to indicate whether the letter string formed an existing Dutch word or not by pressing the arrow keys. Half of the participants had to press the left arrow for word and the right arrow for nonword, and vice versa for the other half.
Data Location:
The example dataset also includes R scripts at this location that used Accuracy in Parameter Estimation in a different fashion.
HDHS<- read.csv("data/HDHSAIPE.txt", sep="")
str(HDHS)
#> 'data.frame': 2880 obs. of 8 variables:
#> $ RT : num 0.52 0.453 0.467 0.534 0.573 ...
#> $ zRT : num -0.303 -0.492 -0.453 -0.265 -0.153 ...
#> $ Pp : int 1 1 1 1 1 1 1 1 1 1 ...
#> $ Type : chr "R" "R" "R" "R" ...
#> $ Prime : chr "hengst" "matrak" "eland" "erwt" ...
#> $ Target : chr "veulen" "wapen" "gewei" "wortel" ...
#> $ accTarget: int 1 1 1 1 1 1 1 1 1 0 ...
#> $ accPrime : int 1 1 1 1 0 1 1 1 1 0 ...
Dataset Citation:
Heyman, T. (2022, February 4). Dataset AIPE. Retrieved from osf.io/frxpd [based on Heyman, T., De Deyne, S., Hutchison, K. A., & Storms, G. (2015). Using the speeded word fragment completion task to examine semantic priming. Behavior Research Methods, 47(2), 580-606.]
Column Metadata:
Variable Name |
Variable Description |
Type (numeric, character, logical, etc.) |
---|---|---|
RT |
Response time to the target in seconds |
Numeric |
zRT |
Z-transformed target response times per participant |
Numeric |
Pp |
Participant identifier (1 to 40) |
Integer |
Type |
Whether target was preceded by a related prime (R) or an unrelated prime (U) |
Character |
Prime |
Prime stimulus (in Dutch) |
Character |
Target |
Target stimulus (in Dutch) |
Character |
accTarget |
Whether response to target was correct (1) or not (0) |
Integer |
accPrime |
Whether response to the preceding prime was correct (1) or not (0) |
Integer |
AIPE Analysis:
# pick only correct answers
HDHScorrect <- HDHS[HDHS$accTarget==1,]
summary_stats <- HDHScorrect %>% #data frame
select(RT, Target) %>% #pick the columns
group_by(Target) %>% #put together the stimuli
summarize(SES = sd(RT)/sqrt(length(RT)), samplesize = length(RT)) #create SE and the sample size for below
##give descriptives of the SEs
describe(summary_stats$SES)
#> vars n mean sd median trimmed mad min max range skew kurtosis se
#> X1 1 72 0.05 0.05 0.03 0.04 0.02 0.02 0.31 0.29 3.63 13.19 0.01
##figure out the original sample sizes (not really necessary as all Targets were seen by 40 participants)
original_SS <- HDHS %>% #data frame
count(Target) #count up the sample size
##add the original sample size to the data frame
summary_stats <- merge(summary_stats, original_SS, by = "Target")
##original sample size average
describe(summary_stats$n)
#> vars n mean sd median trimmed mad min max range skew kurtosis se
#> X1 1 72 40 0 40 40 0 40 40 0 NaN NaN 0
##reduced sample size
describe(summary_stats$samplesize)
#> vars n mean sd median trimmed mad min max range skew kurtosis se
#> X1 1 72 38.12 3.09 39 38.83 1.48 22 40 18 -3.29 12.08 0.36
##percent retained
describe(summary_stats$samplesize/summary_stats$n)
#> vars n mean sd median trimmed mad min max range skew kurtosis se
#> X1 1 72 0.95 0.08 0.98 0.97 0.04 0.55 1 0.45 -3.29 12.08 0.01
flextable(head(HDHScorrect)) %>% autofit()
RT |
zRT |
Pp |
Type |
Prime |
Target |
accTarget |
accPrime |
---|---|---|---|---|---|---|---|
0.5202093 |
-0.3026836 |
1 |
R |
hengst |
veulen |
1 |
1 |
0.4532606 |
-0.4918172 |
1 |
R |
matrak |
wapen |
1 |
1 |
0.4670391 |
-0.4528923 |
1 |
R |
eland |
gewei |
1 |
1 |
0.5335296 |
-0.2650529 |
1 |
R |
erwt |
wortel |
1 |
1 |
0.5732744 |
-0.1527717 |
1 |
R |
ijzel |
glad |
1 |
0 |
0.3870141 |
-0.6789673 |
1 |
R |
sauna |
warm |
1 |
1 |
Stopping Rule
What the usual standard error for the data that could be considered for our stopping rule?
SE <- tapply(HDHScorrect$RT, HDHScorrect$Target, function (x) { sd(x)/sqrt(length(x)) })
min(SE)
#> [1] 0.01511263
max(SE)
#> [1] 0.3058638
cutoff <- quantile(SE, probs = .4)
cutoff
#> 40%
#> 0.03113963
The items have a range of 0.0151126 to 0.3058638. We could use the 40% decile SE = 0.0311396 as our critical value for our stopping rule, as suggested by the manuscript analysis. We could also have a set SE to a specific target if we do not believe we have representative pilot data in this example. You should also consider the scale when estimating these values (i.e., millisecond data has more room to vary than other smaller scales).
Minimum Sample Size
To estimate minimum sample size, we should figure out what number of participants it would take to achieve 80% of the SEs for items below our critical score of 0.0311396?
# sequence of sample sizes to try
nsim <- 10 # small for cran
samplesize_values <- seq(20, 500, 5)
# create a blank table for us to save the values in
sim_table <- matrix(NA,
nrow = length(samplesize_values)*nsim,
ncol = length(unique(HDHS$Target)))
# make it a data frame
sim_table <- as.data.frame(sim_table)
# add a place for sample size values
sim_table$sample_size <- NA
iterate <- 1
for (p in 1:nsim){
# loop over sample sizes
for (i in 1:length(samplesize_values)){
# temp dataframe that samples and summarizes
temp <- HDHScorrect %>%
group_by(Target) %>%
sample_n(samplesize_values[i], replace = T) %>%
summarize(se = sd(RT)/sqrt(length(RT)))
colnames(sim_table)[1:length(unique(HDHScorrect$Target))] <- temp$Target
sim_table[iterate, 1:length(unique(HDHScorrect$Target))] <- temp$se
sim_table[iterate, "sample_size"] <- samplesize_values[i]
sim_table[iterate, "nsim"] <- p
iterate <- 1 + iterate
}
}
final_sample <-
sim_table %>%
pivot_longer(cols = -c(sample_size, nsim)) %>%
group_by(sample_size, nsim) %>%
summarize(percent_below = sum(value <= cutoff)/length(unique(HDHScorrect$Target))) %>%
ungroup() %>%
# then summarize all down averaging percents
dplyr::group_by(sample_size) %>%
summarize(percent_below = mean(percent_below)) %>%
dplyr::arrange(percent_below) %>%
ungroup()
#> `summarise()` has grouped output by 'sample_size'. You can override using the
#> `.groups` argument.
flextable(final_sample %>% head()) %>% autofit()
sample_size |
percent_below |
---|---|
20 |
0.3958333 |
25 |
0.4416667 |
30 |
0.4527778 |
35 |
0.4972222 |
40 |
0.5430556 |
45 |
0.5708333 |
Minimum Sample Size
# use semanticprimer cutoff function for prop variance
cutoff <- calculate_cutoff(population = HDHScorrect,
grouping_items = "Target",
score = "RT",
minimum = as.numeric(min(HDHScorrect$RT)),
maximum = as.numeric(max(HDHScorrect$RT)))
# showing how this is the same as the person calculated version versus semanticprimeR's function
cutoff$cutoff
#> 40%
#> 0.03113963
final_table <- calculate_correction(
proportion_summary = final_sample,
pilot_sample_size = HDHScorrect %>% group_by(Target) %>%
summarize(sample_size = n()) %>%
ungroup() %>% summarize(avg_sample = mean(sample_size)) %>%
pull(avg_sample),
proportion_variability = cutoff$prop_var
)
flextable(final_table) %>%
autofit()
percent_below |
sample_size |
corrected_sample_size |
---|---|---|
80.55556 |
95 |
84.96635 |
85.41667 |
120 |
103.14554 |
90.13889 |
180 |
141.33833 |
Based on these simulations, we can decide our minimum sample size is likely close to 85 and 88.984375 including information about data loss.
Maximum Sample Size
In this example, we could set our maximum sample size for 90% power (as defined as 90% of items below our criterion), which would equate to 141 and 147.609375 with the expected data loss. The final table does not include 95% of items below our criterion, even after estimating 500 participants. An investigation of the table indicates that it levels off at 93-94%.
Final Sample Size
In any estimate of sample size, you should also consider the potential for missing data and/or unusable data due to any other exclusion criteria in your study (i.e., attention checks, speeding, getting the answer right, etc.). In this study, we likely expect all participants to see all items, and therefore, we could expect to use the minimum sample size as our final sample size, the point at which all items reach our SE criterion, or the maximum sample size. Note that maximum sample sizes can also be defined by time, money, or other means.