Power and Sample Size Simulation: Typicality, goodness, imageability, and familiarity of stimuli across 16 categories

Vignette Setup:

Project/Data Title:

Exemplar by feature applicability matrices and other Dutch normative data for semantic concepts

Data provided by: Wolf Vanpaemel

Project/Data Description:

This data provides extensive exemplar by feature applicability matrices covering 15 or 16 different categories (birds, fish, insects, mammals, amphibians/reptiles, clothing, kitchen utensils, musical instruments, tools, vehicles, weapons, fruit, vegetables, professions, and sports), as well as two large semantic domains (animals and artifacts). For all exemplars of the semantic categories, typicality ratings, goodness ratings, goodness rank order, generation frequency, exemplar associative strength, category associative strength, estimated age of acquisition, word frequency, familiarity ratings, imageability ratings, and pairwise similarity ratings are described as well. The structure of the dataset is not programming language friendly. Here, we only consider typicality.

Methods Description:

The typicality data were collected as part of a larger data collection. Here we describe the typicality data collection only. The data collection took place in a large classroom where all the participants were present at the same time. The participants received a booklet with instructions on the first page, followed by four sheets with a semantic category label printed in bold on top. Each of the category labels was followed by a list of 5–33 items belonging to that category, referring to exemplars. The participants were asked to indicate, for every item in the list, how typical it was for the category printed on top of the page. They used a Likert-type rating scale, ranging from 1 for very atypical items to 20 for very typical items. If they encountered an exemplar they did not know, they were asked to circle it. Every participant completed typicality ratings for four different categories. The assignment of categories to participants was randomized. For every category, four different random permutations of the exemplars were used, and each of these permutations was distributed with an equal frequency among the participants. All the exemplars of a category were rated by 28 different participants.

Data Location:

https://static-content.springer.com/esm/art%3A10.3758%2FBRM.40.4.1030/MediaObjects/DeDeyne-BRM-2008b.zip and included here.

### for typicality data -- cleaning and processing
typicality_fnames <- list.files(path = "data/vanpaemel_data",
                                full.names = TRUE)

typicality_dfs <- lapply(typicality_fnames, read.csv)

ID <- c(1:16)
typicality_dfs <- mapply(cbind, typicality_dfs, "SampleID" = ID, SIMPLIFY = F)

typicality_all_df <- bind_rows(typicality_dfs)
typicality_all_df_v2 <- typicality_all_df %>% 
                        unite("comp_group", X:X.1, remove = TRUE) %>% 
                        select(-c(30,31,32,33,34)) %>% 
                        drop_na(c(2:29)) %>%
                        filter_all(any_vars(!is.na(.))) %>%
                        dplyr::rename(compType = SampleID)
# typicality_all_df_v2
typicality_all_df_v3 <- typicality_all_df_v2 %>% 
  select(starts_with("X"), compType, comp_group) %>% 
  pivot_longer(cols = starts_with("X"), 
               names_to = "participant", 
               values_to = "score")
                    
head(typicality_all_df_v3)
#> # A tibble: 6 × 4
#>   compType comp_group  participant score
#>      <int> <chr>       <chr>       <int>
#> 1        1 kikker_frog X.2            18
#> 2        1 kikker_frog X.3            20
#> 3        1 kikker_frog X.4            19
#> 4        1 kikker_frog X.5            12
#> 5        1 kikker_frog X.6            20
#> 6        1 kikker_frog X.7            15

Date Published:

2008-11-01

Dataset Citation:

De Deyne, S., Verheyen, S., Ameel, E. et al. Exemplar by feature applicability matrices and other Dutch normative data for semantic concepts. Behavior Research Methods 40, 1030–1048 (2008). https://doi.org/10.3758/BRM.40.4.1030

Keywords:

Typicality, goodness, imageability, familiarity

Use License:

CC-By Attribution 4.0 International

Geographic Description - City/State/Country of Participants:

University of Leuven, Belgium

Column Metadata:

metadata <- import("data/vanpaemel_metadata.xlsx")

flextable(metadata) %>% autofit()

Variable Name	Variable Description	Type (numeric, character, logical, etc.)
compType	Comparison type for typicality rating	Character
comp_group	Individual items within compType	Character
participant	Participant number	Character
score	Typicality: how typical is the item for the category?	Numeric

AIPE Analysis:

Stopping Rule

In this example, we will pick one comparison type and use the items within that to estimate sample size. This choice is arbitrary!

# individual SEs among different comparison group
SE <- tapply(typicality_all_df_v3$score, typicality_all_df_v3$compType, function (x) { sd(x)/sqrt(length(x)) })
SE
#>         1         2         3         4         5         6         7         8 
#> 0.4847915 0.1868793 0.1894860 0.2326625 0.1862387 0.2310363 0.1433243 0.1751163 
#>         9        10        11        12        14        16 
#> 0.1888044 0.1563060 0.2512611 0.1945454 0.2042343 0.2520606

min(SE)
#> [1] 0.1433243
max(SE)
#> [1] 0.4847915

# comparison type 1: amphibians

typicality_data_gp1_sub <- subset(typicality_all_df_v3, compType == 1)

# individual SEs for  comparison type 1
SE1 <- tapply(typicality_data_gp1_sub$score, typicality_data_gp1_sub$comp_group, function (x) { sd(x)/sqrt(length(x)) })

SE1
#>           kikker_frog    krokodil_crocodile              pad_toad 
#>             0.4836714             1.1085074             0.7368140 
#> salamander_salamander    schildpad_tortoise 
#>             0.7531742             1.6330366

# sequence of sample sizes to try
nsim <- 10 # small for cran 
samplesize_values <- seq(5, 200, 5)

# create a blank table for us to save the values in 
sim_table <- matrix(NA, 
                    nrow = length(samplesize_values)*nsim, 
                    ncol = length(unique(typicality_data_gp1_sub$comp_group)))
# make it a data frame
sim_table <- as.data.frame(sim_table)

# add a place for sample size values 
sim_table$sample_size <- NA
sim_table$var <- "score"

iterate <- 1
for (p in 1:nsim){
  
    # loop over sample sizes for comparison type 
  for (i in 1:length(samplesize_values)){
      
    # temp dataframe for comparison type 1 that samples and summarizes
    temp1 <- typicality_data_gp1_sub %>% 
      dplyr::group_by(comp_group) %>% 
      dplyr::sample_n(samplesize_values[i], replace = T) %>% 
      dplyr::summarize(se2 = sd(score)/sqrt(length(score))) 
    
    # add to table
    colnames(sim_table)[1:length(unique(typicality_data_gp1_sub$comp_group))] <- temp1$comp_group
    sim_table[iterate, 1:length(unique(typicality_data_gp1_sub$comp_group))] <- temp1$se2
    sim_table[iterate, "sample_size"] <- samplesize_values[i]
    sim_table[iterate, "nsim"] <- p
    
    iterate <- 1 + iterate 
  }
  
}

Calculate the cutoff score with information necessary for correction.

cutoff <- calculate_cutoff(population = typicality_data_gp1_sub, 
                 grouping_items = "comp_group",
                 score = "score", 
                 minimum = min(typicality_data_gp1_sub$score),
                 maximum = max(typicality_data_gp1_sub$score))

cutoff$cutoff
#>       40% 
#> 0.7466301

### for response outputs 
# figure out cut off
final_sample <- 
  sim_table %>%
  pivot_longer(cols = -c(sample_size, var, nsim)) %>% 
  dplyr::rename(item = name, se = value) %>% 
  dplyr::group_by(sample_size, var, nsim) %>% 
  dplyr::summarize(percent_below = sum(se <= cutoff$cutoff)/length(unique(typicality_data_gp1_sub$comp_group))) %>% 
  ungroup() %>% 
  # then summarize all down averaging percents
  dplyr::group_by(sample_size, var) %>% 
  summarize(percent_below = mean(percent_below)) %>% 
  dplyr::arrange(percent_below) %>% 
  ungroup()
#> `summarise()` has grouped output by 'sample_size', 'var'. You can override
#> using the `.groups` argument.
#> `summarise()` has grouped output by 'sample_size'. You can override using the
#> `.groups` argument.

flextable(final_sample) %>% autofit()

sample_size	var	percent_below
10	score	0.12
5	score	0.16
15	score	0.18
20	score	0.30
25	score	0.38
30	score	0.52
35	score	0.56
40	score	0.56
45	score	0.62
50	score	0.62
55	score	0.66
60	score	0.68
65	score	0.78
70	score	0.80
75	score	0.82
80	score	0.82
85	score	0.82
95	score	0.82
110	score	0.84
115	score	0.84
100	score	0.86
90	score	0.88
105	score	0.88
125	score	0.88
135	score	0.88
130	score	0.90
150	score	0.92
120	score	0.94
140	score	0.94
155	score	0.94
195	score	0.96
160	score	0.98
170	score	0.98
175	score	0.98
145	score	1.00
165	score	1.00
180	score	1.00
185	score	1.00
190	score	1.00
200	score	1.00

Calculate the final corrected scores:

final_scores <- calculate_correction(proportion_summary = final_sample,
                     pilot_sample_size = length(unique(typicality_data_gp1_sub$participant)),
                     proportion_variability = cutoff$prop_var)

flextable(final_scores) %>% autofit()

percent_below	sample_size	corrected_sample_size
80	70	69.05319
88	90	86.00674
94	120	108.71054
100	145	126.05262

Minimum Sample Size

Based on these simulations, we can decide our minimum sample size is likely close to 69.

Maximum Sample Size

In this example, we could set our maximum sample size for 90% power, which would equate to 109 participants.

Necdet Gürkan & Yangyang Yu

2024-11-10