class: title-slide, middle <div style = "position:fixed; visibility: hidden"> `$$\require{color}\definecolor{red}{rgb}{0.698039215686274, 0.133333333333333, 0.133333333333333}$$` `$$\require{color}\definecolor{green}{rgb}{0.125490196078431, 0.698039215686274, 0.666666666666667}$$` `$$\require{color}\definecolor{blue}{rgb}{0.274509803921569, 0.509803921568627, 0.705882352941177}$$` `$$\require{color}\definecolor{yellow}{rgb}{0.823529411764706, 0.411764705882353, 0.117647058823529}$$` `$$\require{color}\definecolor{purple}{rgb}{0.866666666666667, 0.627450980392157, 0.866666666666667}$$` </div> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { Macros: { red: ["{\\color{red}{#1}}", 1], green: ["{\\color{green}{#1}}", 1], blue: ["{\\color{blue}{#1}}", 1], yellow: ["{\\color{yellow}{#1}}", 1], purple: ["{\\color{purple}{#1}}", 1] }, loader: {load: ['[tex]/color']}, tex: {packages: {'[+]': ['color']}} } }); </script> <style> .red {color: #B22222;} .green {color: #20B2AA;} .blue {color: #4682B4;} .yellow {color: #D2691E;} .purple {color: #DDA0DD;} </style> ### Statistical Modeling in Experimental Psychology # W08 Path Models ## A Bridge from Regression to SEM #### Han Hao @ Tarleton State University --- ## Roadmap (what we’ll do) - Reframe regression as a path model - Why path models are "better" (conceptually + practically) - Mediation in a *single* model (reproduce your Unsworth mediation assignment in PSYC5316) - A more complex path model (3 mediators) - Preview: MIMIC as the “first step” into SEM (latent + observed causes) --- ## From regression to path models #### Regression mindset: - One outcome at a time: `\(Y \sim X_1 + X_2 + \cdots\)` - Interpretation is conditional mean: **holding other predictors constant, the predicted value (mean) of Y when X is ...** #### Path model mindset: - Multiple .red[endogenous] variables are modeled **simultaneously** - A single set of assumed relationships implies a **model-implied covariance matrix** - The model becomes a *testable theory that illustrates complex relationships* --- ## Overview A **path model** is a set of regression equations and covariance assumptions among the target (manifest) variables, estimated **simultaneously**. Terms of variables: - **.blue[Exogenous]**: not predicted in the model (has no incoming single arrows in a diagram) - Similar to "Xs" in a regression - **.red[Endogenous]**: predicted by something (has incoming single arrows in a diagram) - Similar to "Y" in a regression, but still different --- ## Overview A **path model** is a set of regression equations and covariance assumptions among the target (manifest) variables, estimated **simultaneously**. Key parameters estimated in a path model: - **Directional paths** (regression relationships) - **Covariances** among **.blue[exogenous]** variables - **Residual variances** (and sometimes residual covariances) of **.red[endogenous]** variables - **Intercepts** of variables (that are often omitted) --- ## Diagram grammar (universal) - Rectangle = observed variable - Single-headed (straight) arrow `\(X \rightarrow Y\)` = regression effect - Double-headed (curved) arrow `\(X \leftrightarrow Z\)` = covariance (or correlation) - Small “error” circle → endogenous variable = estimated residual Terms again: - **.blue[Exogenous]**: not predicted in the model (has no incoming single arrows) - **.red[Endogenous]**: predicted by something (has incoming single arrows) --- ## Why path models? 1) **One coherent model** - multiple regression equations estimated together (shared information) - direct vs indirect effects (mediation) is native, not an afterthought 2) **Focus on model comparison** - test whether the *whole theory* reproduces relations in the full set of data - compare nested models (drop or constrain paths) as competing theories 3) **Extensible to SEM** - extensible to latent factors next week (measurement errors handled) --- ## Attention! 1) **We still need experiment designs and assumptions for causal claims** - temporal order, omitted confounds, measurement quality 2) **Equivalent models exist** - different diagrams can imply the same covariance structure - e.g., correlation or causation? 3) **“Model with good fit” is not truth** - it’s consistency with the covariance pattern, not proof of anything - "All models are wrong but some are useful" (Box, 1978) --- ## Identification and df (again) - If a model is **just-identified** (df = 0), it fits perfectly by saturating all possible associations - The estimated parameters are still informative, but fit indices are less so - For most modeling practices, df > 0 is a goal for model specification - Imposing constraints (some paths fixed to 0 or a certain number), making the model more parsimonious (using fewer parameters to describe or summarize the info) while retaining acceptable fit - Testing **theory-driven models** is better than taking a fully saturated model that exhausts every possible relationship as default --- ## lavaan specifications grammar - **.yellow[Y ~ X + Z]**: Y predicted by X and Z - **.yellow[Y ~ a*X + Z]**: assigning labels to certain parameters for dedicated estimations or constraints - **.yellow[X ~~ Z]**: covariance (or residual covariance) - **.yellow[X ~~ X]**: variance (usually automatically specified by wrapper functions) - **.yellow[ind := a*b]**: a specific effect defined and estimated (computed from estimated paths) --- ## Mediation as a path model (concept) Classic mediation: - X → M → Y - Direct effect: c' (X → Y controlling for M, as a "covariate") - Indirect effect: a*b (X → M times M → Y) - Total effect: c = c' + ab Path modeling advantages: - estimate **all** paths together and can define indirect effects directly - model comparisons to test competing hypothesis (e.g., with vs. without direct effect) - bootstrap CIs (similar to bootstrapping method in mediation analysis) --- class: inverse, middle ## Let's see some real data and results! ### Not-a-demo dataset: Unsworth et al. (2014) #### Reproduce the 1-mediator case (Lab 5 in PSYC5316), then 3-mediators (one-step closer to the orginal analyses in Unsworth et al.) --- ## Refresher on the data and the study [.purple[Unsworth et al. (2014)]](https://fukudalab.org/wp-content/uploads/2016/09/unsworth_etal_cogpsy2014.pdf) Working memory and fluid intelligence: Capacity, attention control, and secondary memory retrieval ``` r dat <- read.csv("Unsworth2014.csv") psych::describe(dat[, c("symspan", "anti", "raven")], ranges = F) ``` ``` vars n mean sd skew kurtosis se symspan 1 171 30.15 8.06 -0.82 0.39 0.62 anti 2 171 0.63 0.14 0.10 -0.33 0.01 raven 3 171 10.42 3.00 -0.11 -0.04 0.23 ``` --- ## Refresher on the data and the study In Lab05 Assignment of PSYC5316 (The mediation assignment), we did a mediation analysis where **.red[anti]** (as an AC measure) is assumed to mediate the effect of **.red[symspan]** (as a WM measure) on **.red[raven]** (as a Gf measure) <!-- --> --- ## Regression workflow If we run this as regressions: ``` r model1 <- lm(anti ~ symspan, dat) model2 <- lm(raven ~ anti + symspan, dat) # model3 <- lm(raven ~ symspan, dat)) summary(mediation::mediate(model1, model2, treat = "symspan", mediator = "anti", boot = TRUE)) ``` ``` Causal Mediation Analysis Nonparametric Bootstrap Confidence Intervals with the Percentile Method Estimate 95% CI Lower 95% CI Upper p-value ACME 0.0260552 0.0064852 0.0499393 0.006 ** ADE 0.0944255 0.0265472 0.1670591 0.006 ** Total Effect 0.1204807 0.0570433 0.1871727 <2e-16 *** Prop. Mediated 0.2162605 0.0487518 0.6028456 0.006 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Sample Size Used: 171 Simulations: 1000 ``` --- ## 1-mediator model in lavaan Now let's reproduce the mediation model with "lavaan". Here, we will use the .red[sem()] function. It is another wrapper function in the "lavaan" package. Similar to .red[cfa()], it provides a set of default arguments that are designed for structural equation models (and path models). ``` r m_med1 <- " anti ~ a*symspan raven ~ c1*symspan + b*anti ind := a*b total := c1 + (a*b) " fit_med1 <- sem(model = m_med1, data = dat, se = "bootstrap", bootstrap = 100) summary(fit_med1, standardized = TRUE, fit.measures = TRUE, rsquare = TRUE) ``` ``` lavaan 0.6-21 ended normally after 1 iteration Estimator ML Optimization method NLMINB Number of model parameters 5 Number of observations 171 Model Test User Model: Test statistic 0.000 Degrees of freedom 0 Model Test Baseline Model: Test statistic 47.791 Degrees of freedom 3 P-value 0.000 User Model versus Baseline Model: Comparative Fit Index (CFI) 1.000 Tucker-Lewis Index (TLI) 1.000 Loglikelihood and Information Criteria: Loglikelihood user model (H0) -312.724 Loglikelihood unrestricted model (H1) -312.724 Akaike (AIC) 635.449 Bayesian (BIC) 651.157 Sample-size adjusted Bayesian (SABIC) 635.325 Root Mean Square Error of Approximation: RMSEA 0.000 90 Percent confidence interval - lower 0.000 90 Percent confidence interval - upper 0.000 P-value H_0: RMSEA <= 0.050 NA P-value H_0: RMSEA >= 0.080 NA Standardized Root Mean Square Residual: SRMR 0.000 Parameter Estimates: Standard errors Bootstrap Number of requested bootstrap draws 100 Number of successful bootstrap draws 100 Regressions: Estimate Std.Err z-value P(>|z|) Std.lv Std.all anti ~ symspan (a) 0.006 0.001 4.701 0.000 0.006 0.346 raven ~ symspan (c1) 0.094 0.030 3.155 0.002 0.094 0.254 anti (b) 4.320 1.523 2.836 0.005 4.320 0.203 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .anti 0.017 0.002 9.699 0.000 0.017 0.880 .raven 7.672 0.986 7.781 0.000 7.672 0.859 R-Square: Estimate anti 0.120 raven 0.141 Defined Parameters: Estimate Std.Err z-value P(>|z|) Std.lv Std.all ind 0.026 0.010 2.554 0.011 0.026 0.070 total 0.120 0.029 4.201 0.000 0.120 0.324 ``` --- ## Visualize the path diagram ``` r semPaths(fit_med1, what = "std", layout = "spring", edge.color = "black", color = "lavender", sizeMan = 9, edge.label.cex = 2, residuals = T) ``` <!-- --> --- ## Model comparison: full vs partial mediation - Partial mediation model: the model in the previous diagram - Full mediation model: an alternative model without the direct path (X → Y) - Fit and compare the two models ``` r m_med1_f <- " anti ~ a*symspan raven ~ b*anti # No direct path nor the user-defined effects " fit_med1_f <- sem(model = m_med1_f, data = dat, se = "bootstrap", bootstrap = 100) anova(fit_med1_f, fit_med1) ``` ``` Chi-Squared Difference Test Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq) fit_med1 0 635.45 651.16 0.00 fit_med1_f 1 644.40 656.97 10.95 10.95 0.24122 1 0.0009361 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- class: inverse, middle ## Let's get into the next level! ### A model with 3 mediators between WM (SymSpan) → Gf(RAPM) #### "Orient" as a "Capacity" indicator, "Anti" as an "Attention Control" indicator, "PA" as a "Secondary Memory" indicator --- ## Parallel mediation .center[  ] --- ## The (manifest) variables for this model Here I’ll use: - "orient" as a Capacity measure - "anti" as an "Attention Control" measure(same mediator as before) - "pa" as a "Secondary Memory" measure We will still use: - "symspan" as the working memory measure - "raven" as the fluid reasoning measure --- ## 3-mediator lavaan model ``` r m_med3 <- " # a paths orient ~ a1*symspan anti ~ a2*symspan pa ~ a3*symspan # b paths + direct raven ~ cprime*symspan + b1*anti + b2*pa + b3*orient # defined effects ind1 := a1*b1 ind2 := a2*b2 ind3 := a3*b3 ind_total := ind1 + ind2 + ind3 total := cprime + ind_total " fit_med3 <- sem(model = m_med3, data = dat, se = "bootstrap", bootstrap = 100) summary(fit_med3, standardized = TRUE, fit.measures = TRUE, rsquare = TRUE) ``` ``` lavaan 0.6-21 ended normally after 1 iteration Estimator ML Optimization method NLMINB Number of model parameters 11 Number of observations 171 Model Test User Model: Test statistic 23.006 Degrees of freedom 3 P-value (Chi-square) 0.000 Model Test Baseline Model: Test statistic 111.650 Degrees of freedom 10 P-value 0.000 User Model versus Baseline Model: Comparative Fit Index (CFI) 0.803 Tucker-Lewis Index (TLI) 0.344 Loglikelihood and Information Criteria: Loglikelihood user model (H0) -543.086 Loglikelihood unrestricted model (H1) -531.583 Akaike (AIC) 1108.172 Bayesian (BIC) 1142.730 Sample-size adjusted Bayesian (SABIC) 1107.899 Root Mean Square Error of Approximation: RMSEA 0.197 90 Percent confidence interval - lower 0.127 90 Percent confidence interval - upper 0.276 P-value H_0: RMSEA <= 0.050 0.001 P-value H_0: RMSEA >= 0.080 0.996 Standardized Root Mean Square Residual: SRMR 0.096 Parameter Estimates: Standard errors Bootstrap Number of requested bootstrap draws 100 Number of successful bootstrap draws 100 Regressions: Estimate Std.Err z-value P(>|z|) Std.lv Std.all orient ~ symspan (a1) 0.038 0.009 4.026 0.000 0.038 0.301 anti ~ symspan (a2) 0.006 0.001 4.701 0.000 0.006 0.346 pa ~ symspan (a3) 0.006 0.002 2.429 0.015 0.006 0.182 raven ~ symspan (cprm) 0.071 0.030 2.385 0.017 0.071 0.195 anti (b1) 2.069 1.454 1.423 0.155 2.069 0.099 pa (b2) 2.829 0.753 3.755 0.000 2.829 0.241 orient (b3) 0.547 0.226 2.422 0.015 0.547 0.188 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .orient 0.931 0.073 12.761 0.000 0.931 0.909 .anti 0.017 0.002 9.699 0.000 0.017 0.880 .pa 0.061 0.005 11.867 0.000 0.061 0.967 .raven 6.872 0.911 7.542 0.000 6.872 0.794 R-Square: Estimate orient 0.091 anti 0.120 pa 0.033 raven 0.206 Defined Parameters: Estimate Std.Err z-value P(>|z|) Std.lv Std.all ind1 0.078 0.060 1.304 0.192 0.078 0.030 ind2 0.017 0.006 2.937 0.003 0.017 0.083 ind3 0.003 0.002 1.706 0.088 0.003 0.034 ind_total 0.099 0.060 1.645 0.100 0.099 0.147 total 0.170 0.056 3.054 0.002 0.170 0.342 ``` --- ## Full mediation model and comparison ``` r m_med3_f <- " # a paths orient ~ symspan anti ~ symspan pa ~ symspan # b paths + direct raven ~ anti + pa + orient " fit_med3_f <- sem(model = m_med3_f, data = dat, se = "bootstrap", bootstrap = 100) anova(fit_med3_f, fit_med3) ``` ``` Chi-Squared Difference Test Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq) fit_med3 3 1108.2 1142.7 23.006 fit_med3_f 4 1112.9 1144.3 29.699 6.6927 0.18246 1 0.009681 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- ## Going beyond basic mediation models .center[  ] --- ## Other path model features - Multiple outcomes simultaneously - e.g., `Y1 ~ X + M`, `Y2 ~ X + M`, or you may try `Y1 + Y2 ~ X + M` - Correlated residuals among variables - when the assumption that variables share variance beyond modeled associations is needed - `X1 ~ X2` - Equality constraints - test “are two effects equal?” by constraining coefficients - `Y ~ a*X1 + a*X2` --- ## Other path model features You can even reproduce moderation investigations with path models (by using a ":" to specify the interaction). > Please try the following example, and compare the results between the lavaan path model and the regular linear regression model. Then use a robust estimator (e.g., "MLM") instead and compare again. What reflections do you have on these results? ``` r m_mod <- " raven ~ anti + pa + anti:pa " fit_mod <- sem(model = m_mod, data = dat, estimator = "ML") summary(fit_mod, standardized = T) ``` ``` lavaan 0.6-21 ended normally after 1 iteration Estimator ML Optimization method NLMINB Number of model parameters 4 Number of observations 171 Model Test User Model: Test statistic 0.000 Degrees of freedom 0 Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model Structured Regressions: Estimate Std.Err z-value P(>|z|) Std.lv Std.all raven ~ anti 7.064 3.487 2.026 0.043 7.064 0.331 pa 6.349 3.804 1.669 0.095 6.349 0.533 anti:pa -4.777 5.850 -0.817 0.414 -4.777 -0.315 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .raven 7.510 0.812 9.247 0.000 7.510 0.841 ``` ``` r summary(lm(raven ~ anti + pa + anti*pa, data = dat)) ``` ``` Call: lm(formula = raven ~ anti + pa + anti * pa, data = dat) Residuals: Min 1Q Median 3Q Max -7.7401 -1.6365 -0.0691 1.5617 7.3888 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.328 2.180 1.985 0.0488 * anti 7.064 3.529 2.002 0.0469 * pa 6.349 3.849 1.650 0.1009 anti:pa -4.777 5.919 -0.807 0.4208 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.773 on 167 degrees of freedom Multiple R-squared: 0.1594, Adjusted R-squared: 0.1443 F-statistic: 10.55 on 3 and 167 DF, p-value: 2.156e-06 ``` --- ## Bridge to next week (SEM) Today: path models (rectangles; observed variables) Next week: SEM (rectangles + circles) Key upgrade: - **Measurement error is modeled explicitly** - Structural paths can be interpreted as relations among *latent constructs* - Path model tests the **structural hypotheses** for theories; CFA tests the **measurement hypotheses** for theories; SEM combines them.