class: title-slide, middle <div style = "position:fixed; visibility: hidden"> `$$\require{color}\definecolor{red}{rgb}{0.698039215686274, 0.133333333333333, 0.133333333333333}$$` `$$\require{color}\definecolor{green}{rgb}{0.125490196078431, 0.698039215686274, 0.666666666666667}$$` `$$\require{color}\definecolor{blue}{rgb}{0.274509803921569, 0.509803921568627, 0.705882352941177}$$` `$$\require{color}\definecolor{yellow}{rgb}{0.823529411764706, 0.411764705882353, 0.117647058823529}$$` `$$\require{color}\definecolor{purple}{rgb}{0.866666666666667, 0.627450980392157, 0.866666666666667}$$` </div> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { Macros: { red: ["{\\color{red}{#1}}", 1], green: ["{\\color{green}{#1}}", 1], blue: ["{\\color{blue}{#1}}", 1], yellow: ["{\\color{yellow}{#1}}", 1], purple: ["{\\color{purple}{#1}}", 1] }, loader: {load: ['[tex]/color']}, tex: {packages: {'[+]': ['color']}} } }); </script> <style> .red {color: #B22222;} .green {color: #20B2AA;} .blue {color: #4682B4;} .yellow {color: #D2691E;} .purple {color: #DDA0DD;} </style> ### Statistical Modeling in Experimental Psychology # W13 Signal Detection Theory ## A story of d', c, and ROC #### Han Hao @ Tarleton State University ??? This week begins the cognitive/computational modeling block with the most accessible framework of the three. The main goal is to show students that observed responses can be decomposed into at least two psychologically meaningful components: discriminability and response bias. --- ## Why do we need SDT? .pull-left[ - Many tasks require a **decision under .red["uncertainty"]** - A correct/incorrect score mixes together: - how well someone separates signal from noise - how willing they are to say “yes” - Two people can show similar accuracy for very different reasons ] .pull-right[ <!-- --> ] --- ## A simple example Imagine a recognition-memory task: We ask the participants to study and memorize a list of words, and test them with another list in which: - Half of the test words are **old** (studied before) - the other half of them are **new** (not in the study list) - On each trial, the participant decides whether the word is new or old Question: - Is a person accurate because they really discriminate old from new? - Or also because they lean toward one response? --- ## The four possible outcomes .pull-left[ - **Hit**: say “old” to an old item (.blue[Correctly] identify when target is present) - **Miss**: say “new” to an old item (.red[Incorrectly] reject when target is present) - **False alarm**: say “old” to a new item (.red[Incorrectly] identify when target is absent) - **Correct rejection**: say “new” to a new item (.blue[Correctly] reject when target is absent) ] .pull-right[ <!-- --> ] --- <!-- --> --- ## The latent-evidence perspective SDT assumes that subjects are decision makers based on latent (hidden) evidence with uncertainty (e.g., "familiarity of the word"). Each trial produces an internal evidence for the subjects to "evaluate" for a decision. - Some trials ("new words") are just "distractions" that only provide noises (let's assume they are normally distributed for simplicity; e.g, some new words seems more familiar than others) - Other trials ("old words") have signals added to those noises (so that they are distinguished from noises and are detected; e.g, old words generally seem more familiar than new words) .red[Note: For simplicity, we assume that the two distributions have the same SD.] --- ## The latent-evidence perspective .pull-left[ - The two evidence distributions will overlap (**why?**) - The subjects have certain "capability" and "habits" when making those decisions, based on the (latent) strength of evidence. - Subjects make a decision based on their "capability" (sensitivity or discriminability) and "habits" (criterion or bias). ] .pull-right[  Figure from this JASP tutorial ([**.red[link]**](https://jasp-stats.org/2020/10/29/how-to-compute-signal-detection-theory-functions-in-jasp-a-case-study/)) ] --- ## Criterion .pull-left[ In SDT, we assume that a person will set a decision threshold, or **criterion/bias**. - If evidence exceeds the criterion, they respond to identify (say “Old”) - If not, they respond to reject (say “New”) A person with more "liberal" criterion: - more likely to identify a word as “old” - increases hits **and** false alarms ] .pull-right[  ] --- ## Criterion .pull-left[ In SDT, we assume that a person will set a decision threshold, or **criterion**. - If evidence exceeds the criterion, they respond “old” - If not, they respond “new” A more "conservative" criterion: - less likely to identify a word as “old” - decreases hits **and** false alarms ] .pull-right[  ] --- ## Sensitivity The more separated the two distributions (signal and noise), the more likely subjects make mistakes (thus more sensitive to signals). - In previous slides, we describe this "added" signal strength as if it is a value we can manipulate - But in reality, it is influenced by both .red[the "physical" signal strength] (how "dissimilar" the old words and the new words are) and .red[the ability of the subjects to distinguish signal from noises]. - In SDT, we estimate this "signal strength + subject's sensitivity" using `\(d'\)` ("d-prime") --- .pull-left[ ## `d'` and `c` Let's look at how we quantitatively define and calculate sensitivity and criterion/bias. - The meaning of d' - The meaning of c **.red[In real-world analyses, what do we know (what's in our dataset)? What remains unknown and needs to be estimated?]** ] .pull-right[  ] --- ## From binary outcomes to rates For a yes/no task, we summarize performance with: $$ H = \frac{\text{hits}}{\text{total signal trials}} $$ $$ FA = \frac{\text{false alarms}}{\text{total noise trials}} $$ Interpretation: - `\(H\)` rate tells us how often the person says “yes” when signal is present - `\(FA\)` rate tells us how often the person says “yes” when signal is absent --- ## Classical SDT indices: Sensitivity .pull-left[ Under the equal-variance normal SDT model, by definition d' is: $$ d' = \frac{(\mu_s - \mu_n)}{\sigma} $$ But we do not know these parameters in practice. We do know, however, this d' equals to the sum of the distance between this criterion to `\(\mu_{noise}\)` and `\(\mu_{signal}\)` to criterion. ] .pull-right[  ] --- ## Classical SDT indices: Sensitivity .pull-left[ - We can calculate these distances in z-score units based on the cumulative probabilities (how?) - The function of the normal distribution - Let's denote the z-score for a certain probability as `\(z(p)\)` - Same z-scores as in basic stats but converted from "p-values" ] .pull-right[ Under the equal-variance normal SDT model, the "z-score" approach to calculate d': $$ d' = z(1-FA) - z(1-H) $$ Equivalently: $$ d' = z(H) - z(FA) $$ ] --- ## Classical SDT indices: Criterion If we assume the noise distribution has a mean of 0, the criterion on its raw scale is: $$ c_{raw} = z(1-FA) = -z(FA) $$ But in practice we prefer to center this raw value based on d', so that we have: $$ c_{centered} = -z(FA) - \frac{d'}{2} = -\frac{z(H) + z(FA)}{2} $$ In this case: - positive `\(c_{centered}\)` = more "conservative" - negative `\(c_{centered}\)` = more "liberal" --- ## A potential issue in extreme rates If: - hit rate = 1 (Meaning?) - false-alarm rate = 0 (Meaning?) then the `\(z\)` transform becomes problematic (Why?). In practice: - we often apply a small correction for extreme proportions (so instead of a 1, we may have a 0.999) - Common software programs can handle this automatically so we don't need to worry about it --- ## Other parameters based on `\(d'\)` and `\(c\)` Relative criterion: `\(c' = \frac{c}{d'}\)` Response bias: `\(\beta = e^{c \cdot d'}\)` Likelihood ratio: `\(ln(\beta) = c \cdot d'\)` --- # ROC thinking: The Overview Receiver operating characteristic (ROC) curves: A visual approach to understand the H-FA trade-off in SDT. - x-axis: False Alarm (FA) rate - y-axis: Hit (H) rate **.red[Theoretical ROC]**: Under the normal distribution assumption, for each d', there is only one corresponding ROC curve **by definition** (under varying FA): z(H) = z(FA) + d' Let's look at the connections among everything before moving on using this app: [**.red[Link]**](https://dprime-calculator.vercel.app/) --- ## ROC thinking: Theoretical ROC Some intuition based on the theoretical ROC: - A curve that occupies more of the upper-left area is reflecting a "better sensor situation" (one with a higher `\(d'\)`) - Is it a better sensor (a high-ability subject) or a stronger signal (an obvious task)? We don't know (at least not by only a `\(d'\)` value; it also depends on what is investigated) - The diagonal suggests chance-like discrimination (What would be the signal-noise situation and `\(d'\)` for that?) - Moving along the curve reflects different criteria (`\(c\)`), so with a pair of d' and c, we can map a specific "sensor case" onto a point on a specific ROC curve in the HA-F coordinate system --- ## ROC thinking: Empirical ROC Or, for estimated `\(d'\)`s and `\(c\)`s from observed data under the same "situation" - Map the points in the same coordinate system and illustrate the empirical ROC curve(s) - Without assuming the potential distributions of signals and noises - Empirical curves could be in all different shapes (non-parametric) Examples: - An empirical ROC for a group of subjects on a same test - An empirical ROC for the same subject on different conditions of a task (**.red[how we define a condition]**) - Compare multiple ROCs (multiple subjects, multiple tasks/conditions, etc.) --- .pull-left[ ## A quick note #### To the use of SDT in model evaluation The entire SDT framework can also be used in model evaluation, particularly for models that are used to classify and distinguish two groups (such as predicting a yes/no outcome). **.red[Area under curve (AUC)]** are commonly used to summarize the performance illustrated by the ROC curve. ] .pull-right[  ] --- ## AUC as a companion summary - AUC is often useful as a broad summary of ranking/discrimination - But it is **not** the same thing as `\(d'\)` - For our class, we'll keep `\(d'\)` and `\(c\)` as the central SDT quantities --- ## ROC for polytonomous responses Instead of requiring a definite answer ("new or old"), we allow subjects to give confidence ratings from 1 = "Surely new" to 6 = "Surely old". We can estimate 5 different cutoffs about new/old judgement in the same person's response based on their ratings: - 1-5, new vs. 6, old - 1-4, new vs. 5-6, old - ... --- ## ROC for polytonomous responses .pull-left[ For each cutoff, we can do one set of d' and c estimation from hit and false alarm rates based on that cutoff. Thus, we have 5 + 2 (why?) internal ROC points to estimate an empirical ROC curve for that individual. ] .pull-right[ <!-- --> ] --- .left-column[ ### ROC for individual differences ROC curves based on the **.red[confidence ratings]** of 9 subjects ] .right-column[ <!-- --> ] --- <!-- --> --- class: inverse, middle # Demo ## A basic SDT analysis on binary responses ### How to calculate the parameters --- ## Demo dataset We will use `hbmem::prm09`. A dataset from [**.red[Pratte et al. (2010)]**](https://pcn.psychology.msstate.edu/Publications/Pratte_etal_JEPLMC_2010.pdf) Subjects studied 240 words and took a recognition test with 480 words (240 old and 240 new). - each row is a trial - `cond`: `0 = new`, `1 = studied (old)` - `sub`: subject id - `item`: item id - `lag`: 0-centered lag if old (?) - `resp`: one of six confidence responses from “sure new” to “sure studied” - `resp01`: `0 = say new`, `1 = say studied (old)` .red[This is what I added to the dataset] ??? This is the most important logistics slide for the demo. Tell students that the dataset is not custom-made for this class; it comes built into an R package, which is useful for reproducible examples. --- ## What we should remember - Percent correct is incomplete - SDT separates **sensitivity** from **response bias** - Hits and false alarms lead to `\(d'\)` and `\(c\)` - SDT parameters support ROC thinking for different purposes --- ## Bridge to the next two weeks This 3-week sequence builds as follows: - **Week 1: SDT** - choices only - sensitivity and criterion - **Week 2: DDM** - choices + response times - dynamic evidence accumulation - **Week 3: Hierarchical Bayes** (maybe?) - partial pooling and uncertainty - multi-level estimation of cognitive parameters ??? Use this to close the week and preview the logic of the full unit. The students should see SDT as the static starting point, DDM as the dynamic extension, and hierarchical Bayes as the estimation framework that can organize everything more coherently.