Skip to contents

Vignette Setup:

knitr::opts_chunk$set(echo = TRUE)

# Set a random seed
set.seed(5989320)

# Libraries necessary for this vignette
library(rio)
library(flextable)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(psych)
library(semanticprimeR)
#> 
#> Attaching package: 'semanticprimeR'
#> The following object is masked from 'package:dplyr':
#> 
#>     top_n

# Function for simulation
item_power <- function(data, # name of data frame
                       dv_col, # name of DV column as a character
                       item_col, # number of items column as a character
                       nsim = 10, # small for cran 
                       sample_start = 20, 
                       sample_stop = 200, 
                       sample_increase = 5,
                       decile = .5){
  
  DF <- cbind.data.frame(
    "dv" = data[ , dv_col],
    "items" = data[ , item_col]
  )
  
  # just in case
  colnames(DF) <- c("dv", "items")
  
  # figure out the "sufficiently narrow" ci value
  SE <- tapply(DF$dv, DF$items, function (x) { sd(x)/sqrt(length(x)) })
  cutoff <- quantile(SE, probs = decile)
  
  # sequence of sample sizes to try
  samplesize_values <- seq(sample_start, sample_stop, sample_increase)

  # create a blank table for us to save the values in 
  sim_table <- matrix(NA, 
                      nrow = length(samplesize_values)*nsim, 
                      ncol = length(unique(DF$items)))

  # make it a data frame
  sim_table <- as.data.frame(sim_table)

  # add a place for sample size values 
  sim_table$sample_size <- NA

  iterate <- 1
  
  for (p in 1:nsim){
    # loop over sample sizes
    for (i in 1:length(samplesize_values)){
        
      # temp that samples and summarizes
      temp <- DF %>% 
        group_by(items) %>% 
        sample_n(samplesize_values[i], replace = T) %>% 
        summarize(se = sd(dv)/sqrt(length(dv)))
      
      # dv on items
      colnames(sim_table)[1:length(unique(DF$items))] <- temp$items
      sim_table[iterate, 1:length(unique(DF$items))] <- temp$se
      sim_table[iterate, "sample_size"] <- samplesize_values[i]
      sim_table[iterate, "nsim"] <- p
      
      iterate <- iterate + 1
    }
  }

  # figure out cut off
  final_sample <- sim_table %>% 
    pivot_longer(cols = -c(sample_size, nsim)) %>% 
    dplyr::rename(item = name, se = value) %>% 
    group_by(sample_size, nsim) %>% 
    summarize(percent_below = sum(se <= cutoff)/length(unique(DF$items))) %>% 
    ungroup() %>% 
    # then summarize all down averaging percents
    dplyr::group_by(sample_size) %>% 
    summarize(percent_below = mean(percent_below)) %>% 
    dplyr::arrange(percent_below) %>% 
    ungroup()
  
  return(list(
    SE = SE, 
    cutoff = cutoff, 
    DF = DF, 
    sim_table = sim_table, 
    final_sample = final_sample
  ))

}

Project/Data Title:

Attractiveness Ratings

Data provided by: Carlota Batres

Project/Data Description:

This dataset contains 200 participants rating 20 faces on attractiveness. Ethical approval was received from the Franklin and Marshall Institutional Review Board and each participant provided informed consent. All participants were located in the United States. Participants were instructed that they would be viewing several faces which were photographed facing forward, under constant camera and lighting conditions, with neutral expressions, and closed mouths. Each participant would have to rate the attractiveness of the presented faces. More specifically, participants were asked “How attractive is this face?”, where 1 = “Not at all attractive” and 7 = “Very attractive”. Participants rated each face individually, in random order, and with no time limit. Upon completion, participants were paid for participation in the study.

Methods Description:

The data was collected online using Amazon’s Mechanical Turk platform.

Data Location:

Included with the vignette.

DF <- import("data/batres_data.sav")

str(DF)
#> 'data.frame':    200 obs. of  21 variables:
#>  $ Participant_Number: num  1 2 3 4 5 6 7 8 9 10 ...
#>   ..- attr(*, "label")= chr "Unique number assigned to each participant"
#>   ..- attr(*, "format.spss")= chr "F3.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_1            : num  1 2 5 2 3 1 2 2 1 2 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #1"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_2            : num  1 6 5 2 3 1 3 2 2 2 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #2"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_3            : num  3 6 7 7 4 3 5 4 4 4 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #3"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_4            : num  3 7 5 3 4 3 3 3 4 3 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #4"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_5            : num  5 7 7 5 5 6 3 3 3 3 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #5"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_6            : num  5 5 4 5 6 5 4 4 5 3 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #6"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_7            : num  5 7 7 7 4 5 4 4 5 4 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #7"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_8            : num  4 1 5 3 4 4 4 4 2 4 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #8"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_9            : num  3 5 4 4 3 1 2 2 2 2 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #9"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_10           : num  4 4 7 2 3 3 3 3 5 4 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #10"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_11           : num  2 3 5 4 3 2 3 3 4 2 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #11"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_12           : num  4 7 5 4 4 4 3 3 6 1 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #12"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_13           : num  3 3 4 5 4 3 3 3 3 2 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #13"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_14           : num  5 7 5 5 3 5 5 5 4 2 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #14"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_15           : num  3 7 6 3 4 6 3 3 4 4 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #15"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_16           : num  4 7 5 5 5 4 4 3 5 3 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #16"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_17           : num  4 4 5 3 5 4 3 2 4 2 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #17"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_18           : num  3 5 4 6 4 5 4 5 4 2 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #18"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_19           : num  3 4 5 6 4 4 4 3 3 4 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #19"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12
#>  $ Face_20           : num  4 6 6 3 6 4 3 3 4 3 ...
#>   ..- attr(*, "label")= chr "Attractiveness rating for face #20"
#>   ..- attr(*, "format.spss")= chr "F1.0"
#>   ..- attr(*, "display_width")= int 12

Date Published:

No official publication date.

Dataset Citation:

Batres, C. (2022). Attractiveness Ratings. [Data set].

Keywords:

faces, ratings

Use License:

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Geographic Description - City/State/Country of Participants:

United States

Column Metadata:

metadata <- import("data/batres_metadata.xlsx")

flextable(metadata) %>% autofit()

Variable Name

Variable Description

Type (numeric, character, logical, etc.)

Participant_Number

Unique number assigned to each participant

Numeric

Face_1

Attractiveness rating for face #1

Numeric

Face_2

Attractiveness rating for face #2

Numeric

Face_3

Attractiveness rating for face #3

Numeric

Face_4

Attractiveness rating for face #4

Numeric

Face_5

Attractiveness rating for face #5

Numeric

Face_6

Attractiveness rating for face #6

Numeric

Face_7

Attractiveness rating for face #7

Numeric

Face_8

Attractiveness rating for face #8

Numeric

Face_9

Attractiveness rating for face #9

Numeric

Face_10

Attractiveness rating for face #10

Numeric

Face_11

Attractiveness rating for face #11

Numeric

Face_12

Attractiveness rating for face #12

Numeric

Face_13

Attractiveness rating for face #13

Numeric

Face_14

Attractiveness rating for face #14

Numeric

Face_15

Attractiveness rating for face #15

Numeric

Face_16

Attractiveness rating for face #16

Numeric

Face_17

Attractiveness rating for face #17

Numeric

Face_18

Attractiveness rating for face #18

Numeric

Face_19

Attractiveness rating for face #19

Numeric

Face_20

Attractiveness rating for face #20

Numeric

AIPE Analysis:

The data should be in long format with each rating on one row of data.

# Reformat the data
DF_long <- pivot_longer(DF, cols = -c(Participant_Number)) %>% 
  dplyr::rename(item = name, score = value)

flextable(head(DF_long)) %>% autofit()

Participant_Number

item

score

1

Face_1

1

1

Face_2

1

1

Face_3

3

1

Face_4

3

1

Face_5

5

1

Face_6

5

Stopping Rule

# Function for simulation
var1 <- item_power(data = DF_long, # name of data frame
            dv_col = "score", # name of DV column as a character
            item_col = "item", # number of items column as a character
            nsim = 10, 
            sample_start = 20, 
            sample_stop = 300, 
            sample_increase = 5,
            decile = .4)
#> `summarise()` has grouped output by 'sample_size'. You can override using the
#> `.groups` argument.

What the usual standard error for the data that could be considered for our stopping rule using the 40%% decile?

# individual SEs
var1$SE
#>     Face_1    Face_10    Face_11    Face_12    Face_13    Face_14    Face_15 
#> 0.09117808 0.09064190 0.10007472 0.09739767 0.08562437 0.08767230 0.09331351 
#>    Face_16    Face_17    Face_18    Face_19     Face_2    Face_20     Face_3 
#> 0.10262632 0.09082536 0.09530433 0.09123386 0.08818665 0.09799754 0.08644573 
#>     Face_4     Face_5     Face_6     Face_7     Face_8     Face_9 
#> 0.08915009 0.09127172 0.09078109 0.09968796 0.08977638 0.09481468

var1$cutoff
#>        40% 
#> 0.09080765

Using our 40%% decile as a guide, we find that 0.091 is our target standard error for an accurately measured item.

Minimum Sample Size

To estimate minimum sample size, we should figure out what number of participants it would take to achieve 80%, 85%, 90%, and 95% of the SEs for items below our critical score of 0.091?

cutoff <- calculate_cutoff(population = DF_long, 
                           grouping_items = "item",
                           score = "score",
                           minimum = 1,
                           maximum = 7)
# showing how this is the same as the person calculated version versus semanticprimeR's function
cutoff$cutoff
#>        40% 
#> 0.09080765

Please note that you will always need to simulate larger than the pilot data sample size to get the starting numbers. We will correct them below. As shown in our manuscript, we need to correct for the overestimation of sample sizes based on the original pilot data size. Given that the pilot data is large: 200, this correction is especially useful. This correction is built into our function.

final_table <- calculate_correction(
  proportion_summary = var1$final_sample,
  pilot_sample_size = nrow(DF),
  proportion_variability = cutoff$prop_var
  )

flextable(final_table) %>% 
  autofit()

percent_below

sample_size

corrected_sample_size

80.0

230

54.60714

90.5

245

62.29201

90.5

245

62.29201

97.0

255

68.01402

Our minimum suggested sample size does not exist at exactly 80% of the items, but instead we can use the first available over 80% (n = 55 as the minimum).

Maximum Sample Size

While there are many considerations for maximum sample size (time, effort, resources), the simulation suggests that 68 people would ensure nearly all items achieve cutoff criterions.

Final Sample Size

In any estimate for sample size, you should also consider the potential for missing data and/or unusable data due to any other exclusion criteria in your study (i.e., attention checks, speeding, getting the answer right, etc.). In this study, we likely expect all participants to see all items and therefore, we could expect to use the minimum sample size as our final sample size, the point at which all items reach our SE criterion, or the maximum sample size.