Path Models

<div style = "position:fixed; visibility: hidden">
`$$\require{color}\definecolor{red}{rgb}{0.698039215686274, 0.133333333333333, 0.133333333333333}$$`
`$$\require{color}\definecolor{green}{rgb}{0.125490196078431, 0.698039215686274, 0.666666666666667}$$`
`$$\require{color}\definecolor{blue}{rgb}{0.274509803921569, 0.509803921568627, 0.705882352941177}$$`
`$$\require{color}\definecolor{yellow}{rgb}{0.823529411764706, 0.411764705882353, 0.117647058823529}$$`
`$$\require{color}\definecolor{purple}{rgb}{0.866666666666667, 0.627450980392157, 0.866666666666667}$$`
</div>

### Statistical Modeling in Experimental Psychology
# W08 Path Models
## A Bridge from Regression to SEM

#### Han Hao @ Tarleton State University

---

## Agenda

- Reframe regression as a path model  
- Why path models are "better" (conceptually + practically)
- Mediation in a *single* model (reproduce your Unsworth mediation assignment in PSYC5316)
- A more complex path model (3 mediators)
- Preview: MIMIC as the “first step” into SEM (latent + observed causes)

---

## From regression to path models

#### Regression mindset:
- One outcome at a time: `$Y \sim X_1 + X_2 + \cdots$`
- Interpretation is conditional mean: **holding other predictors constant, the predicted value (mean) of Y when X is ...**

#### Path model mindset:
- Multiple .red[endogenous] variables are modeled **simultaneously**
- A single set of assumed relationships implies a **model-implied covariance matrix**  
- The model becomes a *testable theory that illustrates complex relationships*

---

## Overview

A **path model** is a set of regression equations and covariance assumptions among the target (manifest) variables, estimated **simultaneously**.

Terms of variables:
- **.blue[Exogenous]**: not predicted in the model (has no incoming single arrows in a diagram)
  - Similar to "Xs" in a regression
- **.red[Endogenous]**: predicted by something (has incoming single arrows in a diagram)
  - Similar to "Y" in a regression, but still different

---

## Overview

A **path model** is a set of regression equations and covariance assumptions among the target (manifest) variables, estimated **simultaneously**.

Key parameters estimated in a path model:
- **Directional paths** (regression relationships)
- **Covariances** among **.blue[exogenous]** variables
- **Residual variances** (and sometimes residual covariances) of **.red[endogenous]** variables
- **Intercepts** of variables (that are often omitted)

---

## Diagram grammar (universal)

- Rectangle = observed variable  
- Single-headed (straight) arrow `$X \rightarrow Y$` = regression effect  
- Double-headed (curved) arrow `$X \leftrightarrow Z$` = covariance (or correlation)
- Small “error” circle → endogenous variable = estimated residual

Terms again:
- **.blue[Exogenous]**: not predicted in the model (has no incoming single arrows)
- **.red[Endogenous]**: predicted by something (has incoming single arrows)

---

## Why path models?

1) **One coherent model**  
   - multiple regression equations estimated together (shared information)
   - direct vs indirect effects (mediation) is native, not an afterthought

2) **Focus on model comparison**  
   - test whether the *whole theory* reproduces relations in the full set of data  
   - compare nested models (drop or constrain paths) as competing theories

3) **Extensible to SEM**  
   - extensible to latent factors next week (measurement errors handled)

---

## Attention!

1) **We still need experiment designs and assumptions for causal claims**
  - temporal order, omitted confounds, measurement quality
  
2) **Equivalent models exist**
  - different diagrams can imply the same covariance structure
  - e.g., correlation or causation?
  
3) **“Model with good fit” is not truth**
  - it’s consistency with the covariance pattern, not proof of anything
  - "All models are wrong but some are useful" (Box, 1978)

---

## Identification and df (again)

- If a model is **just-identified** (df = 0), it fits perfectly by saturating all possible associations  
  - The estimated parameters are still informative, but fit indices are less so

- For most modeling practices, df > 0 is a goal for model specification  
  - Imposing constraints (some paths fixed to 0 or a certain number), making the model more parsimonious (using fewer parameters to describe or summarize the info) while retaining acceptable fit

- Testing **theory-driven models** is better than taking a fully saturated model that exhausts every possible relationship as default

---

## lavaan specifications grammar

- **.yellow[Y ~ X + Z]**: Y predicted by X and Z

- **.yellow[Y ~ a*X + Z]**: assigning labels to certain parameters for dedicated estimations or constraints

- **.yellow[X ~~ Z]**: covariance (or residual covariance)

- **.yellow[X ~~ X]**: variance (usually automatically specified by wrapper functions)

- **.yellow[ind := a*b]**: a specific effect defined and estimated (computed from estimated paths)

---

## Mediation as a path model (concept)

Classic mediation:
- X → M → Y
- Direct effect: c' (X → Y controlling for M, as a "covariate")
- Indirect effect: a*b (X → M times M → Y)
- Total effect: c = c' + ab

Path modeling advantages:
- estimate **all** paths together and can define indirect effects directly 
- model comparisons to test competing hypothesis (e.g., with vs. without direct effect)
- bootstrap CIs (similar to bootstrapping method in mediation analysis)

---

## Let's see some real data and results!
### Not-a-demo dataset: Unsworth et al. (2014)  
#### Reproduce the 1-mediator case (Lab 5 in PSYC5316), then 3-mediators (one-step closer to the orginal analyses in Unsworth et al.)

---

## Refresher on the data and the study

[.purple[Unsworth et al. (2014)]](https://fukudalab.org/wp-content/uploads/2016/09/unsworth_etal_cogpsy2014.pdf) Working memory and fluid intelligence: Capacity, attention control, and secondary memory retrieval

``` r
dat <- read.csv("Unsworth2014.csv")

psych::describe(dat[, c("symspan", "anti", "raven")], ranges = F)
```

```
        vars   n  mean   sd  skew kurtosis   se
symspan    1 171 30.15 8.06 -0.82     0.39 0.62
anti       2 171  0.63 0.14  0.10    -0.33 0.01
raven      3 171 10.42 3.00 -0.11    -0.04 0.23
```

---

## Refresher on the data and the study

In Lab05 Assignment of PSYC5316 (The mediation assignment), we did a mediation analysis where **.red[anti]** (as an AC measure) is assumed to mediate the effect of **.red[symspan]** (as a WM measure) on **.red[raven]** (as a Gf measure)

![](PSYC5318W08Slides_files/figure-html/unnamed-chunk-2-1.png)

---

## Regression workflow

If we run this as regressions:

``` r
model1 <- lm(anti ~ symspan, dat)
model2 <- lm(raven ~ anti + symspan, dat)
# model3 <- lm(raven ~ symspan, dat))

summary(mediation::mediate(model1, model2, treat = "symspan",
                           mediator = "anti", boot = TRUE))
```

```

Causal Mediation Analysis

Nonparametric Bootstrap Confidence Intervals with the Percentile Method

Estimate 95% CI Lower 95% CI Upper p-value    
ACME           0.0260552    0.0074376    0.0497225   0.010 ** 
ADE            0.0944255    0.0270299    0.1573086   0.006 ** 
Total Effect   0.1204807    0.0566496    0.1806681  <2e-16 ***
Prop. Mediated 0.2162605    0.0572211    0.5764432   0.010 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sample Size Used: 171

Simulations: 1000 
```

---

## 1-mediator model in lavaan

Now let's reproduce the mediation model with "lavaan". Here, we will use the .red[sem()] function. It is another wrapper function in the "lavaan" package. Similar to .red[cfa()], it provides a set of default arguments that are designed for structural equation models (and path models).

``` r
m_med1 <- "
anti  ~ a*symspan
raven ~ c1*symspan + b*anti

ind   := a*b
total := c1 + (a*b)
"

fit_med1 <- sem(model = m_med1, data  = dat,
                se = "bootstrap", bootstrap = 100)

summary(fit_med1, standardized = TRUE,
        fit.measures = TRUE, rsquare = TRUE)
```

```
lavaan 0.6-21 ended normally after 1 iteration

Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         5

Number of observations                           171

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Model Test Baseline Model:

Test statistic                                47.791
  Degrees of freedom                                 3
  P-value                                        0.000

User Model versus Baseline Model:

Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.000

Loglikelihood and Information Criteria:

Loglikelihood user model (H0)               -312.724
  Loglikelihood unrestricted model (H1)       -312.724
                                                      
  Akaike (AIC)                                 635.449
  Bayesian (BIC)                               651.157
  Sample-size adjusted Bayesian (SABIC)        635.325

Root Mean Square Error of Approximation:

RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.000
  P-value H_0: RMSEA <= 0.050                       NA
  P-value H_0: RMSEA >= 0.080                       NA

Standardized Root Mean Square Residual:

SRMR                                           0.000

Parameter Estimates:

Standard errors                            Bootstrap
  Number of requested bootstrap draws              100
  Number of successful bootstrap draws             100

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  anti ~                                                                
    symspan    (a)    0.006    0.001    5.092    0.000    0.006    0.346
  raven ~                                                               
    symspan   (c1)    0.094    0.032    2.988    0.003    0.094    0.254
    anti       (b)    4.320    1.373    3.147    0.002    4.320    0.203

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .anti              0.017    0.002    9.566    0.000    0.017    0.880
   .raven             7.672    0.937    8.190    0.000    7.672    0.859

R-Square:
                   Estimate
    anti              0.120
    raven             0.141

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    ind               0.026    0.010    2.651    0.008    0.026    0.070
    total             0.120    0.030    4.018    0.000    0.120    0.324
```

---

## Visualize the path diagram

``` r
semPaths(fit_med1, what = "std", layout = "spring",
         edge.color = "black", color = "lavender",
         sizeMan = 9, edge.label.cex = 2, residuals = T)
```

![](PSYC5318W08Slides_files/figure-html/unnamed-chunk-5-1.png)

---

## Model comparison: full vs partial mediation

- Partial mediation model: the model in the previous diagram
- Full mediation model: an alternative model without the direct path (X → Y)
- Fit and compare the two models

``` r
m_med1_f <- "
anti  ~ a*symspan
raven ~ b*anti  # No direct path nor the user-defined effects
"

fit_med1_f <- sem(model = m_med1_f, data = dat,
                     se = "bootstrap", bootstrap = 100)

anova(fit_med1_f, fit_med1)
```

```

Chi-Squared Difference Test

Df    AIC    BIC Chisq Chisq diff   RMSEA Df diff Pr(>Chisq)    
fit_med1    0 635.45 651.16  0.00                                          
fit_med1_f  1 644.40 656.97 10.95      10.95 0.24122       1  0.0009361 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---

## Let's get into the next level!
### A model with 3 mediators between WM (SymSpan) → Gf(RAPM)
#### "Orient" as a "Capacity" indicator, "Anti" as an "Attention Control" indicator, "PA" as a "Secondary Memory" indicator

---

## Parallel mediation

---

## The (manifest) variables for this model

Here I’ll use:
- "orient" as a Capacity measure
- "anti" as an "Attention Control" measure(same mediator as before)
- "pa" as a "Secondary Memory" measure

We will still use:
- "symspan" as the working memory measure
- "raven" as the fluid reasoning measure

---

## 3-mediator lavaan model

``` r
m_med3 <- "
# a paths
anti ~ a1*symspan
pa ~ a2*symspan
orient ~ a3*symspan

# b paths + direct
raven ~ cprime*symspan + b1*anti + b2*pa + b3*orient

# defined effects
ind1 := a1*b1
ind2 := a2*b2
ind3 := a3*b3
ind_total := ind1 + ind2 + ind3
total := cprime + ind_total
"

fit_med3 <- sem(model = m_med3, data  = dat,
                se = "bootstrap", bootstrap = 100)

summary(fit_med3, standardized = TRUE, fit.measures = TRUE, rsquare = TRUE)
```

```
lavaan 0.6-21 ended normally after 2 iterations

Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        11

Number of observations                           171

Model Test User Model:
                                                      
  Test statistic                                23.006
  Degrees of freedom                                 3
  P-value (Chi-square)                           0.000

Model Test Baseline Model:

Test statistic                               111.650
  Degrees of freedom                                10
  P-value                                        0.000

User Model versus Baseline Model:

Comparative Fit Index (CFI)                    0.803
  Tucker-Lewis Index (TLI)                       0.344

Loglikelihood and Information Criteria:

Loglikelihood user model (H0)               -543.086
  Loglikelihood unrestricted model (H1)       -531.583
                                                      
  Akaike (AIC)                                1108.172
  Bayesian (BIC)                              1142.730
  Sample-size adjusted Bayesian (SABIC)       1107.899

Root Mean Square Error of Approximation:

RMSEA                                          0.197
  90 Percent confidence interval - lower         0.127
  90 Percent confidence interval - upper         0.276
  P-value H_0: RMSEA <= 0.050                    0.001
  P-value H_0: RMSEA >= 0.080                    0.996

Standardized Root Mean Square Residual:

SRMR                                           0.096

Parameter Estimates:

Standard errors                            Bootstrap
  Number of requested bootstrap draws              100
  Number of successful bootstrap draws             100

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  anti ~                                                                
    symspan   (a1)    0.006    0.001    5.092    0.000    0.006    0.346
  pa ~                                                                  
    symspan   (a2)    0.006    0.002    2.452    0.014    0.006    0.182
  orient ~                                                              
    symspan   (a3)    0.038    0.009    4.205    0.000    0.038    0.301
  raven ~                                                               
    symspan (cprm)    0.071    0.031    2.326    0.020    0.071    0.195
    anti      (b1)    2.069    1.460    1.417    0.156    2.069    0.099
    pa        (b2)    2.829    0.821    3.446    0.001    2.829    0.241
    orient    (b3)    0.547    0.188    2.918    0.004    0.547    0.188

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .anti              0.017    0.002    9.566    0.000    0.017    0.880
   .pa                0.061    0.004   14.432    0.000    0.061    0.967
   .orient            0.931    0.087   10.652    0.000    0.931    0.909
   .raven             6.872    0.890    7.718    0.000    6.872    0.794

R-Square:
                   Estimate
    anti              0.120
    pa                0.033
    orient            0.091
    raven             0.206

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    ind1              0.012    0.010    1.292    0.196    0.012    0.034
    ind2              0.016    0.009    1.880    0.060    0.016    0.044
    ind3              0.021    0.009    2.188    0.029    0.021    0.057
    ind_total         0.049    0.015    3.266    0.001    0.049    0.135
    total             0.120    0.030    4.018    0.000    0.120    0.329
```

---

## Full mediation model and comparison

``` r
m_med3_f <- "
# a paths
orient ~ symspan
anti ~ symspan
pa ~ symspan

# b paths + direct
raven ~ anti + pa + orient
"

fit_med3_f <- sem(model = m_med3_f, data  = dat,
                  se = "bootstrap", bootstrap = 100)

anova(fit_med3_f, fit_med3)
```

```

Chi-Squared Difference Test

Df    AIC    BIC  Chisq Chisq diff   RMSEA Df diff Pr(>Chisq)   
fit_med3    3 1108.2 1142.7 23.006                                         
fit_med3_f  4 1112.9 1144.3 29.699     6.6927 0.18246       1   0.009681 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---

## Going beyond basic mediation models

---
## Other path model features

- Multiple outcomes simultaneously  
  - e.g., `Y1 ~ X + M`, `Y2 ~ X + M`, or you may try `Y1 + Y2 ~ X + M`
- Correlated residuals among variables  
  - when the assumption that variables share variance beyond modeled associations is needed
  - `X1 ~~ X2`
- Equality constraints  
  - test “are two effects equal?” by constraining coefficients
  - `Y ~ a*X1 + a*X2`

---

## Other path model features

You can even reproduce moderation investigations with path models (by using a ":" to specify the interaction).

> Please try the following example, and compare the results between the lavaan path model and the regular linear regression model. Then use a robust estimator (e.g., "MLM") instead and compare again. What reflections do you have on these results?

``` r
m_mod <- "
raven ~ anti + pa + anti:pa
"
fit_mod <- sem(model = m_mod, data  = dat, estimator = "ML")

summary(fit_mod, standardized = T)
```

```
lavaan 0.6-21 ended normally after 1 iteration

Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         4

Number of observations                           171

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  raven ~                                                               
    anti              7.064    3.487    2.026    0.043    7.064    0.331
    pa                6.349    3.804    1.669    0.095    6.349    0.533
    anti:pa          -4.777    5.850   -0.817    0.414   -4.777   -0.315

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .raven             7.510    0.812    9.247    0.000    7.510    0.841
```

``` r
summary(lm(raven ~ anti + pa + anti*pa, data = dat))
```

```

Call:
lm(formula = raven ~ anti + pa + anti * pa, data = dat)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.7401 -1.6365 -0.0691  1.5617  7.3888

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)    4.328      2.180   1.985   0.0488 *
anti           7.064      3.529   2.002   0.0469 *
pa             6.349      3.849   1.650   0.1009  
anti:pa       -4.777      5.919  -0.807   0.4208  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.773 on 167 degrees of freedom
Multiple R-squared:  0.1594,	Adjusted R-squared:  0.1443 
F-statistic: 10.55 on 3 and 167 DF,  p-value: 2.156e-06
```

---

## Bridge to next week (SEM)

Today: path models (rectangles; observed variables)  
Next week: SEM (rectangles + circles)

Key upgrade:
- **Measurement error is modeled explicitly**
- Structural paths can be interpreted as relations among *latent constructs*

- Path model tests the **structural hypotheses** for theories; CFA tests the **measurement hypotheses** for theories; SEM combines them.