Skip to contents

Vignette Setup:

Project/Data Title:

The survival processing effect

Data provided by: Röer, Bell & Buchner (2013)

Project/Data Description:

The data come from a conceptual replication study on the survival processing effect. The survival processing effect refers to the finding that rating words according to their relevance in a survival-related scenario leads to better retention than processing words in a number of other fictional scenarios. Participants were randomly assigned to one of the rating scenarios (survival, afterlife, moving). The to-be-rated words were presented individually in a random order on the computer screen. Each word remained on the screen for five seconds. Participants rated the words by clicking on a 5-point scale that ranged from completely useless (1) to very useful (5), which was displayed right below the word.

Methods Description:

Participants were students at Heinrich Heine University Düsseldorf, Germany that were paid for participating or received course credit. Their ages ranged from 18 to 55 years. The words to-be-rated consisted of 30 typical members of 30 categories drawn from the updated Battig and Montague norms (Van Overschelde, Rawson, & Dunlosky, 2004).

Data Location:

Data included within this vignette. We drop the scenario column because the standard deviation and mean of item ratings across the scenarios were identical. We also add a participant column to keep this script similar to other ones.

DF <- import("data/roer_data.xlsx")
drops <- c("Scenario")
DF <- DF[ , !(names(DF) %in% drops)]
DF <- cbind(Participant_Number = 1:nrow(DF) , DF)

str(DF)
#> 'data.frame':    218 obs. of  31 variables:
#>  $ Participant_Number: int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ Item_1            : num  1 3 3 2 3 3 3 1 4 1 ...
#>  $ Item_2            : num  2 5 4 1 4 5 4 3 5 3 ...
#>  $ Item_3            : num  5 3 1 5 2 2 5 2 3 4 ...
#>  $ Item_4            : num  1 4 2 0 3 2 2 1 4 2 ...
#>  $ Item_5            : num  3 5 1 5 1 3 5 2 0 4 ...
#>  $ Item_6            : num  1 3 2 1 4 1 5 2 2 1 ...
#>  $ Item_7            : num  4 2 4 1 4 4 4 1 5 3 ...
#>  $ Item_8            : num  3 5 5 4 4 2 5 5 2 3 ...
#>  $ Item_9            : num  1 5 4 5 4 2 4 1 3 3 ...
#>  $ Item_10           : num  0 5 5 5 0 5 5 4 0 4 ...
#>  $ Item_11           : num  3 5 4 5 3 1 5 5 2 1 ...
#>  $ Item_12           : num  3 5 5 5 5 1 3 5 1 1 ...
#>  $ Item_13           : num  1 1 3 1 2 1 1 4 1 1 ...
#>  $ Item_14           : num  1 2 1 4 1 2 2 1 2 5 ...
#>  $ Item_15           : num  1 4 2 1 5 1 3 2 2 1 ...
#>  $ Item_16           : num  4 5 4 2 3 3 4 3 3 4 ...
#>  $ Item_17           : num  3 5 0 1 3 1 4 3 3 3 ...
#>  $ Item_18           : num  2 5 4 1 5 5 5 2 5 4 ...
#>  $ Item_19           : num  2 1 1 4 1 3 4 2 4 3 ...
#>  $ Item_20           : num  5 3 4 5 2 4 5 3 3 2 ...
#>  $ Item_21           : num  3 0 4 5 3 4 1 1 4 2 ...
#>  $ Item_22           : num  3 5 1 1 2 3 5 4 3 4 ...
#>  $ Item_23           : num  1 1 2 1 1 4 3 1 5 2 ...
#>  $ Item_24           : num  1 4 4 2 3 2 4 5 2 0 ...
#>  $ Item_25           : num  2 5 5 3 5 1 5 4 3 4 ...
#>  $ Item_26           : num  1 4 3 1 3 1 2 1 2 3 ...
#>  $ Item_27           : num  1 5 1 1 5 2 4 4 3 4 ...
#>  $ Item_28           : num  1 5 1 1 4 2 3 5 2 3 ...
#>  $ Item_29           : num  1 5 2 3 2 1 4 3 1 3 ...
#>  $ Item_30           : num  3 5 4 1 1 2 5 4 4 5 ...

Date Published:

No official publication, see citation below.

Dataset Citation:

Röer, J. P., Bell, R., & Buchner, A. (2013). Is the survival-processing memory advantage due to richness of encoding? Journal of Experimental Psychology: Learning, Memory, and Cognition, 39, 1294-1302.

Keywords:

usefulness; survival processing; adaptive memory; richness of encoding

Use License:

CC BY-NC

Geographic Description - City/State/Country of Participants:

Düsseldorf, Germany

Column Metadata:

metadata <- import("data/roer_metadata.xlsx")

flextable(metadata) %>% autofit()

Variable Name

Variable Description

Type (numeric, character, logical, etc.)

Items

Item ratings for item_1 to item_30

Numeric

Scenario

Categorical scenarios- 1,2,3

Numeric

AIPE Analysis:

DF_long <- pivot_longer(DF, cols = -c(Participant_Number)) %>% 
  dplyr:: rename(item = name, score = value)

flextable(head(DF_long)) %>% autofit()

Participant_Number

item

score

1

Item_1

1

1

Item_2

2

1

Item_3

5

1

Item_4

1

1

Item_5

3

1

Item_6

1

Stopping Rule

What is the usual standard error for the data that could be considered for our stopping rule using the 40% decile?

# individual SEs
SE <- tapply(DF_long$score, DF_long$item, function (x) { sd(x)/sqrt(length(x)) })
SE
#>     Item_1    Item_10    Item_11    Item_12    Item_13    Item_14    Item_15 
#> 0.08860091 0.09743922 0.10419876 0.10725788 0.07167329 0.11502337 0.09746261 
#>    Item_16    Item_17    Item_18    Item_19     Item_2    Item_20    Item_21 
#> 0.09425595 0.08981453 0.08968434 0.09817041 0.10421504 0.10768420 0.10195369 
#>    Item_22    Item_23    Item_24    Item_25    Item_26    Item_27    Item_28 
#> 0.09891678 0.11403914 0.08751293 0.10279479 0.08015419 0.09612067 0.09697587 
#>    Item_29     Item_3    Item_30     Item_4     Item_5     Item_6     Item_7 
#> 0.08758437 0.10963101 0.11319421 0.07944272 0.11514133 0.08852481 0.10163361 
#>     Item_8     Item_9 
#> 0.09615698 0.09118122

cutoff <- quantile(SE, probs = .40)
cutoff
#>        40% 
#> 0.09614246

# we could also use the cutoff score function in semanticprimeR
cutoff_score <- calculate_cutoff(population = DF_long,
                                 grouping_items = "item",
                                 score = "score",
                                 minimum = min(DF_long$score),
                                 maximum = max(DF_long$score))

cutoff_score$cutoff
#>        40% 
#> 0.09614246

Using our 40% decile as a guide, we find that 0.096 is our target standard error for an accurately measured item.

Minimum Sample Size

To estimate the minimum sample size, we should figure out what number of participants it would take to achieve 80%, 85%, 90%, and 95% of the SEs for items below our critical score of 0.096.

# sequence of sample sizes to try
nsim <- 10 # small for cran
samplesize_values <- seq(20, 500, 5)

# create a blank table for us to save the values in 
sim_table <- matrix(NA, 
                    nrow = length(samplesize_values)*nsim, 
                    ncol = length(unique(DF_long$item)))

# make it a data frame
sim_table <- as.data.frame(sim_table)

# add a place for sample size values 
sim_table$sample_size <- NA

iterate <- 1
for (p in 1:nsim){
  # loop over sample sizes
  for (i in 1:length(samplesize_values)){
      
    # temp dataframe that samples and summarizes
    temp <- DF_long %>% 
      group_by(item) %>% 
      sample_n(samplesize_values[i], replace = T) %>% 
      summarize(se = sd(score)/sqrt(length(score))) 
    
    colnames(sim_table)[1:length(unique(DF_long$item))] <- temp$item
    sim_table[iterate, 1:length(unique(DF_long$item))] <- temp$se
    sim_table[iterate, "sample_size"] <- samplesize_values[i]
    sim_table[iterate, "nsim"] <- p
    
    iterate <- iterate + 1
  }
}

final_sample <- 
  sim_table %>% 
  pivot_longer(cols = -c(sample_size, nsim)) %>% 
  dplyr::rename(item = name, se = value) %>% 
  group_by(sample_size, nsim) %>% 
  summarize(percent_below = sum(se <= cutoff)/length(unique(DF_long$item))) %>% 
  ungroup() %>% 
  # then summarize all down averaging percents
  dplyr::group_by(sample_size) %>% 
  summarize(percent_below = mean(percent_below)) %>% 
  dplyr::arrange(percent_below) %>% 
  ungroup()
#> `summarise()` has grouped output by 'sample_size'. You can override using the
#> `.groups` argument.

flextable(final_sample %>% head()) %>% autofit()               

sample_size

percent_below

20

0

25

0

30

0

35

0

40

0

45

0

final_table <- calculate_correction(
  proportion_summary = final_sample,
  pilot_sample_size = DF_long %>% group_by(item) %>% 
    summarize(sample_size = n()) %>% ungroup() %>% 
    summarize(avg_sample = mean(sample_size)) %>% pull(avg_sample),
  proportion_variability = cutoff_score$prop_var
  )

flextable(final_table) %>% 
  autofit()

percent_below

sample_size

corrected_sample_size

80.33333

270

60.18641

85.00000

285

67.54308

92.33333

300

75.18211

95.33333

315

82.85322

Based on these simulations, we can decide our minimum sample size is likely close to 60.

Maximum Sample Size

In this example, we could set our maximum sample size for 90% of items, which would equate to 75 participants.

Final Sample Size

We should consider any data loss or other issues related to survival processing when thinking about the final sample size requirements.