Skip to contents

Vignette Setup:

Project/Data Title:

Exemplar by feature applicability matrices and other Dutch normative data for semantic concepts

Data provided by: Wolf Vanpaemel

Project/Data Description:

This data provides extensive exemplar by feature applicability matrices covering 15 or 16 different categories (birds, fish, insects, mammals, amphibians/reptiles, clothing, kitchen utensils, musical instruments, tools, vehicles, weapons, fruit, vegetables, professions, and sports), as well as two large semantic domains (animals and artifacts). For all exemplars of the semantic categories, typicality ratings, goodness ratings, goodness rank order, generation frequency, exemplar associative strength, category associative strength, estimated age of acquisition, word frequency, familiarity ratings, imageability ratings, and pairwise similarity ratings are described as well. The structure of the dataset is not programming language friendly. Here, we only consider typicality.

Methods Description:

The typicality data were collected as part of a larger data collection. Here we describe the typicality data collection only. The data collection took place in a large classroom where all the participants were present at the same time. The participants received a booklet with instructions on the first page, followed by four sheets with a semantic category label printed in bold on top. Each of the category labels was followed by a list of 5–33 items belonging to that category, referring to exemplars. The participants were asked to indicate, for every item in the list, how typical it was for the category printed on top of the page. They used a Likert-type rating scale, ranging from 1 for very atypical items to 20 for very typical items. If they encountered an exemplar they did not know, they were asked to circle it. Every participant completed typicality ratings for four different categories. The assignment of categories to participants was randomized. For every category, four different random permutations of the exemplars were used, and each of these permutations was distributed with an equal frequency among the participants. All the exemplars of a category were rated by 28 different participants.

Data Location:

https://static-content.springer.com/esm/art%3A10.3758%2FBRM.40.4.1030/MediaObjects/DeDeyne-BRM-2008b.zip and included here.

### for typicality data -- cleaning and processing
typicality_fnames <- list.files(path = "data/vanpaemel_data",
                                full.names = TRUE)

typicality_dfs <- lapply(typicality_fnames, read.csv)

ID <- c(1:16)
typicality_dfs <- mapply(cbind, typicality_dfs, "SampleID" = ID, SIMPLIFY = F)

typicality_all_df <- bind_rows(typicality_dfs)
typicality_all_df_v2 <- typicality_all_df %>% 
                        unite("comp_group", X:X.1, remove = TRUE) %>% 
                        select(-c(30,31,32,33,34)) %>% 
                        drop_na(c(2:29)) %>%
                        filter_all(any_vars(!is.na(.))) %>%
                        dplyr::rename(compType = SampleID)
# typicality_all_df_v2
typicality_all_df_v3 <- typicality_all_df_v2 %>% 
  select(starts_with("X"), compType, comp_group) %>% 
  pivot_longer(cols = starts_with("X"), 
               names_to = "participant", 
               values_to = "score")
                    
head(typicality_all_df_v3)
#> # A tibble: 6 × 4
#>   compType comp_group  participant score
#>      <int> <chr>       <chr>       <int>
#> 1        1 kikker_frog X.2            18
#> 2        1 kikker_frog X.3            20
#> 3        1 kikker_frog X.4            19
#> 4        1 kikker_frog X.5            12
#> 5        1 kikker_frog X.6            20
#> 6        1 kikker_frog X.7            15

Date Published:

2008-11-01

Dataset Citation:

De Deyne, S., Verheyen, S., Ameel, E. et al. Exemplar by feature applicability matrices and other Dutch normative data for semantic concepts. Behavior Research Methods 40, 1030–1048 (2008). https://doi.org/10.3758/BRM.40.4.1030

Keywords:

Typicality, goodness, imageability, familiarity

Use License:

CC-By Attribution 4.0 International

Geographic Description - City/State/Country of Participants:

University of Leuven, Belgium

Column Metadata:

metadata <- import("data/vanpaemel_metadata.xlsx")

flextable(metadata) %>% autofit()

Variable Name

Variable Description

Type (numeric, character, logical, etc.)

compType

Comparison type for typicality rating

Character

comp_group

Individual items within compType

Character

participant

Participant number

Character

score

Typicality: how typical is the item for the category?

Numeric

AIPE Analysis:

Stopping Rule

In this example, we will pick one comparison type and use the items within that to estimate sample size. This choice is arbitrary!

# individual SEs among different comparison group
SE <- tapply(typicality_all_df_v3$score, typicality_all_df_v3$compType, function (x) { sd(x)/sqrt(length(x)) })
SE
#>         1         2         3         4         5         6         7         8 
#> 0.4847915 0.1868793 0.1894860 0.2326625 0.1862387 0.2310363 0.1433243 0.1751163 
#>         9        10        11        12        14        16 
#> 0.1888044 0.1563060 0.2512611 0.1945454 0.2042343 0.2520606

min(SE)
#> [1] 0.1433243
max(SE)
#> [1] 0.4847915

# comparison type 1: amphibians
typicality_data_gp1_sub <- subset(typicality_all_df_v3, compType == 1)

# individual SEs for  comparison type 1
SE1 <- tapply(typicality_data_gp1_sub$score, typicality_data_gp1_sub$comp_group, function (x) { sd(x)/sqrt(length(x)) })

SE1
#>           kikker_frog    krokodil_crocodile              pad_toad 
#>             0.4836714             1.1085074             0.7368140 
#> salamander_salamander    schildpad_tortoise 
#>             0.7531742             1.6330366
# sequence of sample sizes to try
nsim <- 10 # small for cran 
samplesize_values <- seq(5, 200, 5)

# create a blank table for us to save the values in 
sim_table <- matrix(NA, 
                    nrow = length(samplesize_values)*nsim, 
                    ncol = length(unique(typicality_data_gp1_sub$comp_group)))
# make it a data frame
sim_table <- as.data.frame(sim_table)

# add a place for sample size values 
sim_table$sample_size <- NA
sim_table$var <- "score"

iterate <- 1
for (p in 1:nsim){
  
    # loop over sample sizes for comparison type 
  for (i in 1:length(samplesize_values)){
      
    # temp dataframe for comparison type 1 that samples and summarizes
    temp1 <- typicality_data_gp1_sub %>% 
      dplyr::group_by(comp_group) %>% 
      dplyr::sample_n(samplesize_values[i], replace = T) %>% 
      dplyr::summarize(se2 = sd(score)/sqrt(length(score))) 
    
    # add to table
    colnames(sim_table)[1:length(unique(typicality_data_gp1_sub$comp_group))] <- temp1$comp_group
    sim_table[iterate, 1:length(unique(typicality_data_gp1_sub$comp_group))] <- temp1$se2
    sim_table[iterate, "sample_size"] <- samplesize_values[i]
    sim_table[iterate, "nsim"] <- p
    
    iterate <- 1 + iterate 
  }
  
}

Calculate the cutoff score with information necessary for correction.

cutoff <- calculate_cutoff(population = typicality_data_gp1_sub, 
                 grouping_items = "comp_group",
                 score = "score", 
                 minimum = min(typicality_data_gp1_sub$score),
                 maximum = max(typicality_data_gp1_sub$score))

cutoff$cutoff
#>       40% 
#> 0.7466301
### for response outputs 
# figure out cut off
final_sample <- 
  sim_table %>%
  pivot_longer(cols = -c(sample_size, var, nsim)) %>% 
  dplyr::rename(item = name, se = value) %>% 
  dplyr::group_by(sample_size, var, nsim) %>% 
  dplyr::summarize(percent_below = sum(se <= cutoff$cutoff)/length(unique(typicality_data_gp1_sub$comp_group))) %>% 
  ungroup() %>% 
  # then summarize all down averaging percents
  dplyr::group_by(sample_size, var) %>% 
  summarize(percent_below = mean(percent_below)) %>% 
  dplyr::arrange(percent_below) %>% 
  ungroup()
#> `summarise()` has grouped output by 'sample_size', 'var'. You can override
#> using the `.groups` argument.
#> `summarise()` has grouped output by 'sample_size'. You can override using the
#> `.groups` argument.

flextable(final_sample) %>% autofit()

sample_size

var

percent_below

10

score

0.12

5

score

0.16

15

score

0.18

20

score

0.30

25

score

0.38

30

score

0.52

35

score

0.56

40

score

0.56

45

score

0.62

50

score

0.62

55

score

0.66

60

score

0.68

65

score

0.78

70

score

0.80

75

score

0.82

80

score

0.82

85

score

0.82

95

score

0.82

110

score

0.84

115

score

0.84

100

score

0.86

90

score

0.88

105

score

0.88

125

score

0.88

135

score

0.88

130

score

0.90

150

score

0.92

120

score

0.94

140

score

0.94

155

score

0.94

195

score

0.96

160

score

0.98

170

score

0.98

175

score

0.98

145

score

1.00

165

score

1.00

180

score

1.00

185

score

1.00

190

score

1.00

200

score

1.00

Calculate the final corrected scores:

final_scores <- calculate_correction(proportion_summary = final_sample,
                     pilot_sample_size = length(unique(typicality_data_gp1_sub$participant)),
                     proportion_variability = cutoff$prop_var)

flextable(final_scores) %>% autofit()

percent_below

sample_size

corrected_sample_size

80

70

69.05319

88

90

86.00674

94

120

108.71054

100

145

126.05262

Minimum Sample Size

Based on these simulations, we can decide our minimum sample size is likely close to 69.

Maximum Sample Size

In this example, we could set our maximum sample size for 90% power, which would equate to 109 participants.