Power and Sample Size Simulation: Seeing is Believing

Vignette Setup:

knitr::opts_chunk$set(echo = TRUE)

# Libraries necessary for this vignette
library(rio)
library(flextable)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(semanticprimeR)
#> 
#> Attaching package: 'semanticprimeR'
#> The following object is masked from 'package:dplyr':
#> 
#>     top_n
set.seed(92747)

# Function for simulation
item_power <- function(data, # name of data frame
                       dv_col, # name of DV column as a character
                       item_col, # number of items column as a character
                       nsim = 10, # small for cran
                       sample_start = 20, 
                       sample_stop = 200, 
                       sample_increase = 5,
                       decile = .5){
  
  DF <- cbind.data.frame(
    dv = data[ , dv_col],
    items = data[ , item_col]
  )
  
  # just in case
  colnames(DF) <- c("dv", "items")
  
  # figure out the "sufficiently narrow" ci value
  SE <- tapply(DF$dv, DF$items, function (x) { sd(x)/sqrt(length(x)) })
  cutoff <- quantile(SE, probs = decile)
  
  # sequence of sample sizes to try
  samplesize_values <- seq(sample_start, sample_stop, sample_increase)

  # create a blank table for us to save the values in 
  sim_table <- matrix(NA, 
                      nrow = length(samplesize_values), 
                      ncol = length(unique(DF$items)))

  # make it a data frame
  sim_table <- as.data.frame(sim_table)

  # add a place for sample size values 
  sim_table$sample_size <- NA

  iterate <- 1
  for (p in 1:nsim){
    # loop over sample sizes
    for (i in 1:length(samplesize_values)){
        
      # temp that samples and summarizes
      temp <- DF %>% 
        group_by(items) %>% 
        sample_n(samplesize_values[i], replace = T) %>% 
        summarize(se = sd(dv)/sqrt(length(dv)))
      
      # dv on items
      colnames(sim_table)[1:length(unique(DF$items))] <- temp$items
      sim_table[iterate, 1:length(unique(DF$items))] <- temp$se
      sim_table[iterate, "sample_size"] <- samplesize_values[i]
      sim_table[iterate, "nsim"] <- p
      
    }
  }

  # figure out cut off
  final_sample <- sim_table %>% 
    pivot_longer(cols = -c(sample_size, nsim)) %>% 
    dplyr::rename(item = name, se = value) %>% 
    group_by(sample_size, nsim) %>% 
    summarize(percent_below = sum(se <= cutoff)/length(unique(DF$items))) %>% 
    ungroup() %>% 
    # then summarize all down averaging percents
    dplyr::group_by(sample_size) %>% 
    summarize(percent_below = mean(percent_below)) %>% 
    dplyr::arrange(percent_below) %>% 
    ungroup()
  
  return(list(
    SE = SE, 
    cutoff = cutoff, 
    DF = DF, 
    sim_table = sim_table, 
    final_sample = final_sample
  ))

}

Project/Data Title:

Seeing Is Believing: How Media Type Effects Truth Judgements

Data provided by: Gianni Ribeiro

Project/Data Description:

People have been duly concerned about how fake news influences the minds of the populous since the rise of propaganda in World War One (Lasswell, 1927). Experts are increasingly worried about the effects of false information spreading over the medium of video. Members of the deep trust alliance, a global network of scholars researching deepfakes and doctored videos, state that ‘a fundamental erosion of trust is already underway’ (Harrison, 2020). Newman et al. (2015) discovered that the media type through which information is presented does indeed affect how true the information feels. Newman speculated that this truthiness effect could be because images provide participants with more information than text alone, thus making the source feel more informationally rich.

In this experiment, our aim is to test the generalizability of Newman’s truthiness effect in two ways: first, to see if it extends to other media types in addition to images, and second, to test if it applies to other domains. In this study, we will present individuals with true and false claims presented through three different media types: (1) text, (2) text alongside a photo, and (3) text alongside a video. This is a direct replication of Newman’s experiment, just with the addition of the video condition. Similarly, participants will also be asked to make truth judgements about trivial claims and claims about COVID-19, to see if the truthiness effect extends to other domains besides trivia.

In this within-subjects design, participants will be presented with true and false claims about trivia and COVID-19 in counterbalanced order. These claims will be randomly assigned to appear either as text alone, text alongside an image, or text alongside a video. Participants will be asked to rate how true they believe each claim is.

Methods Description:

Participants were largely sourced from the first-year participant pool at The University of Queensland. Participation was completely voluntary, and participants can choose to withdraw at any time.

Thirty matched trivia claims were generated directly from Newman’s materials. These claims were selected and a true and false version of each claim was created. Newman’s original claims are available at the following link: https://data.mendeley.com/datasets/r68dcdjrpc/1

The second set of materials comprising of matched true and false claims was generated using information resources from the World Health Organisation, and various conspiracy websites. These claims were then fact-checked by Kirsty Short, an epidemiologist and senior lecturer in the School of Chemistry and Molecular Sciences at The University of Queensland.

The claims were also pilot tested to ensure that none of them performed at floor or ceiling. This pilot test consisted of 56 participants and subsequently four claims were dropped. The data from this pilot test was also used to accurately perform a power analysis. After generating the means of the pilot test, we found that to acquire a power of 0.8 or greater, there must be a mean difference of 0.4 between each media type. This mean difference is quite conservative, since we plan to measure truth ratings on a six-point scale and is easily achievable with 100 participants.

The videos were largely sourced from the stock image website Envato Elements and Screenflow’s Royalty Free Stock Media Library.

Data Location:

Data can be found here: https://osf.io/zu9pg/

DF <- import("data/moat_data.csv.zip") 
  
str(DF)
#> 'data.frame':    5040 obs. of  11 variables:
#>  $ id            : int  1 1 1 1 1 1 1 1 1 1 ...
#>  $ domain        : chr  "covid" "covid" "covid" "covid" ...
#>  $ gender        : chr  "Female" "Female" "Female" "Female" ...
#>  $ age           : int  22 22 22 22 22 22 22 22 22 22 ...
#>  $ filename      : chr  "1_kaitlin_exp_v2b_97832952686.txt" "1_kaitlin_exp_v2b_97832952686.txt" "1_kaitlin_exp_v2b_97832952686.txt" "1_kaitlin_exp_v2b_97832952686.txt" ...
#>  $ medium        : chr  "claim" "claim" "claim" "claim" ...
#>  $ question_type : chr  "drinking" "herd" "hydroxychloroquine" "steam" ...
#>  $ truth         : logi  TRUE TRUE TRUE TRUE TRUE FALSE ...
#>  $ claim         : chr  "COVID-19 cannot be contracted through drinking water." "Herd immunity against COVID-19 cannot be achieved by letting the virus spread through the population." "Studies show hydroxychloroquine does not have clinical benefits in treating COVID-19." "Steam inhalation cannot help cure COVID-19." ...
#>  $ filename_other: chr  NA NA NA NA ...
#>  $ rating        : int  1 5 6 6 6 1 1 1 5 1 ...

Date Published:

No official publication, see citation below.

Dataset Citation:

Moat, K., Tangen, J., & Newman, E. (2021). Seeing Is Believing: How Media Type Effects Truth Judgements.

Keywords:

covid, trivia games, claims, interpretation

Use License:

Open access with reference to original paper (Attribution-NonCommercial-ShareAlike CC BY-NC-SA)

Geographic Description - City/State/Country of Participants:

Brisbane, Queensland, Australia

Column Metadata:

metadata <- tibble::tribble(
             ~Variable.Name,                                                                                                       ~Variable.Description, ~`Type.(numeric,.character,.logical,.etc.)`,
                      "Id",                                                                                                            "Participant ID",                                   "numeric",
                   "Domain",                                                    "Whether the trial is a claim about COVID ('covid') or TRIVIA ('trivia)",                                 "character",
                   "Medium", "Whether the trial appears as text alone ('claim'), text alongside an image ('photo'), or text alongside a video ('video')",                                 "character",
               "Trial_type",                                        "Whether the trial presents a claim that is TRUE ('target') or FALSE ('distractor')",                                 "character",
                   "Rating",                           "Paritcipant’s truth rating of the claim ranging from 1 (definitely false) to 6 (definitely tue)",                                   "numeric"
             )

flextable(metadata) %>% autofit()

Variable.Name	Variable.Description	Type.(numeric,.character,.logical,.etc.)
Id	Participant ID	numeric
Domain	Whether the trial is a claim about COVID ('covid') or TRIVIA ('trivia)	character
Medium	Whether the trial appears as text alone ('claim'), text alongside an image ('photo'), or text alongside a video ('video')	character
Trial_type	Whether the trial presents a claim that is TRUE ('target') or FALSE ('distractor')	character
Rating	Paritcipant’s truth rating of the claim ranging from 1 (definitely false) to 6 (definitely tue)	numeric

AIPE Analysis:

# Function for simulation
var1 <- item_power(data = DF, # name of data frame
            dv_col = "rating", # name of DV column as a character
            item_col = "question_type", # number of items column as a character
            nsim = 10, 
            sample_start = 20, 
            sample_stop = 300, 
            sample_increase = 5,
            decile = .4)
#> `summarise()` has grouped output by 'sample_size'. You can override using the
#> `.groups` argument.

Stopping Rule

What is the usual standard error for the data that could be considered for our stopping rule?

var1$SE
#>             afghan        antibiotics           bacteria          blindness 
#>          0.1733704          0.2192951          0.2550577          0.1787244 
#>      breastfeeding           chemical           children            cocaine 
#>          0.1726203          0.2000046          0.2224147          0.2427085 
#>        colorvision               corn           couscous            curling 
#>          0.1284245          0.1064393          0.1337200          0.1500109 
#>         dartboards            denmark                dna           drinking 
#>          0.1322563          0.1802086          0.2273651          0.1732374 
#>            elderly            fishing             forest         foxhunting 
#>          0.2576951          0.1464690          0.1349403          0.1446861 
#>             grapes               herd                hiv         houseflies 
#>          0.1573876          0.1877676          0.2135034          0.1907986 
#> hydroxychloroquine        infertility          lawnbowls               lime 
#>          0.1495606          0.2049964          0.1147161          0.1465564 
#>            longbow           marathon          microwave         mintonette 
#>          0.1667281          0.1392743          0.1959902          0.1066397 
#>         mosquitoes          mountains          mouthwash               nile 
#>          0.1894154          0.0912085          0.2245578          0.1532541 
#>              otter             oxygen            oysters         penicillin 
#>          0.1280050          0.2093482          0.1173648          0.1584095 
#>             poland          prisoners               rate              rigor 
#>          0.1326624          0.1970849          0.2147593          0.2238266 
#>             saline              smell              snake       snowboarding 
#>          0.2132273          0.2432145          0.1899915          0.1359677 
#>              steam        temperature         triathalon              twice 
#>          0.2243410          0.2215570          0.1851070          0.2292498 
#>             urchin                 uv           vesuvius          vitamin-c 
#>          0.1399591          0.2161739          0.1433344          0.2127463 
#>          vitamin-d              water          waterfall               zulu 
#>          0.2251918          0.1817230          0.1411920          0.1264954
var1$cutoff
#>       40% 
#> 0.1580007

cutoff <- var1$cutoff

# we can also use semanticprimer's function
cutoff_score <- calculate_cutoff(population = DF,
                                 grouping_items = "question_type",
                                 score = "rating",
                                 minimum = min(DF$rating),
                                 maximum = max(DF$rating))
cutoff_score$cutoff
#>       40% 
#> 0.1580007

Using our 40% decile as a guide, we find that 0.158 is our target standard error for an accurately measured item.

Minimum Sample Size

To estimate minimum sample size, we should figure out what number of participants it would take to achieve 80%, 85%, 90%, and 95% of the SEs for items below our critical score of 0.158?

How large does the sample have to be for 80% of the items to be below our stopping SE rule?

flextable(var1$final_sample %>% head()) %>% 
  autofit()

sample_size	percent_below
300	1


final_table <- calculate_correction(
  proportion_summary = var1$final_sample,
  pilot_sample_size = DF %>% group_by(question_type) %>% 
    summarize(sample_size = n()) %>% ungroup() %>% 
    summarize(avg_sample = mean(sample_size)) %>% pull(avg_sample),
  proportion_variability = cutoff_score$prop_var
  )

flextable(final_table) %>% 
  autofit()

percent_below	sample_size	corrected_sample_size
100	300	173.0834
100	300	173.0834
100	300	173.0834
100	300	173.0834

Based on these simulations, we can decide our minimum sample size is likely close to 173.

Maximum Sample Size

In this example, we could set our maximum sample size for 90% of items, which would equate to 173 participants.

Final Sample Size

You should also consider any potential for missing data and/or unusable data given the requirements for your study. Given that participants are likely to see all items in this study, we could use the minimum, stopping rule, and maximum defined above.

Jason Geller

2024-11-10