class: title-slide, middle <div style = "position:fixed; visibility: hidden"> `$$\require{color}\definecolor{red}{rgb}{0.698039215686274, 0.133333333333333, 0.133333333333333}$$` `$$\require{color}\definecolor{green}{rgb}{0.125490196078431, 0.698039215686274, 0.666666666666667}$$` `$$\require{color}\definecolor{blue}{rgb}{0.274509803921569, 0.509803921568627, 0.705882352941177}$$` `$$\require{color}\definecolor{yellow}{rgb}{0.823529411764706, 0.411764705882353, 0.117647058823529}$$` `$$\require{color}\definecolor{purple}{rgb}{0.866666666666667, 0.627450980392157, 0.866666666666667}$$` </div> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { Macros: { red: ["{\\color{red}{#1}}", 1], green: ["{\\color{green}{#1}}", 1], blue: ["{\\color{blue}{#1}}", 1], yellow: ["{\\color{yellow}{#1}}", 1], purple: ["{\\color{purple}{#1}}", 1] }, loader: {load: ['[tex]/color']}, tex: {packages: {'[+]': ['color']}} } }); </script> <style> .red {color: #B22222;} .green {color: #20B2AA;} .blue {color: #4682B4;} .yellow {color: #D2691E;} .purple {color: #DDA0DD;} </style> ### Statistical Modeling in Experimental Psychology # W11 Network Psychometrics ## Exploratory Network Models #### Han Hao @ Tarleton State University ??? This week introduces psychometric network models as another way to think about multivariate structure. I want students to see both the conceptual logic and a practical workflow in R, with EGA as the main exploratory tool and psychonetrics as the main confirmatory tool. --- ## Network Models in lay language LVM "translates" the correlations among indicators of latent psychological constructs to a hierarchical structure - Assuming that the unobserved constructs are causing all the correlations. Network models give us another way (**.red[alternative rather than replacement]**) to describe the "clusters of indicators" structure without that assumption: - The structure as a network system of **conditional relations** among observed variables - The "clusters" as **communities** in a network rather than only reflective factors --- ## What can network modeling do for us? .pull-left[  ] .pull-right[ In this course, network models are useful because they help us think about: - dimensionality - redundancy / local dependence - group/time-series differences in structure - whether a factor interpretation is the only plausible story ] Figure from: https://doi.org/10.1038/s43586-021-00055-w --- .pull-left[ ### Latent variable view Observed variables covary because they share one or more **common causes**. Typical claim: `$$R \approx \Lambda \Phi \Lambda' + \Psi$$` After accounting for the latent variables, residual associations are treated as random fluctuations. ] .pull-right[ ### Network view Observed variables may be related **directly or conditionally** to one another. - Direct associations between manifests after controlling for all other manifests in the data - Clusters in the network may imply dimensions / smaller communities / factors - No common-cause assumption is required at the starting point ] --- ## Alternative rather than replacement A good latent model does **not** automatically invalidate a network model. A good network model does **not** automatically invalidate a latent model. These models can sometimes describe similar data patterns while emphasizing **different stories** about structure. So the practical question is often: **What kind of structure are we trying to describe, and for what inferential purpose?** --- ## A closer look at the PNMs A psychometric network model usually represents variables as: - **Nodes** = observed variables (items, symptoms, test scores, subscale scores) - **Edges** = relations among variables - Directions of edges: single-directional, double-directional, non-directional - **Edge weight** = strength and sign of relation - Numerical weights of edges: weighted or unweighted .center[  ] --- ## A closer look at the PNMs .pull-left[  ] .pull-right[ In psychometric investigations, we are (.red[maybe I am?]) especially interested in **partial correlation networks** Edge between `\(X_i\)` and `\(X_j\)` means they are still meaningfully related **after controlling for all the other nodes in the network** ] Figure from: https://doi.org/10.1080/10705511.2022.2056039 --- ## Edges as Partial correlations For psychological interpretations, the **partial correlation** idea is usually more central. - Usually the default target in Gaussian graphical models (network models for continuous observed variables) - Often sparser compared to regular Pearson's correlations - Adjusts each pair for the remaining nodes (as "covariates") and thus closer to the conditional dependence idea --- ## Why sparse networks are preferred In real data, a fully connected network is usually hard to interpret and not informative, so generally a **sparse** network is preferred, in which: - small edges are shrunk toward zero - some edges are constrained exactly to zero during iterations - the final graph is easier to interpret and compare This is one reason regularized estimation methods, such as **graphical lasso**, became popular. --- ## Gaussian Graphical Model (GGM) For continuous variables, a common network model is the **Gaussian Graphical Model**. - Start from the covariance matrix `\(\Sigma\)` and invert it for the precision matrix `\(\Omega = \Sigma^{-1}\)` - Use the precision matrix to derive partial relations and further testings of parameters, network structure consistency, and model fits Key interpretation: - if `\(\omega_{ij} = 0\)`, nodes `\(i\)` and `\(j\)` are conditionally **.red[independent]** given the other nodes - nonzero entries correspond to possible edges in the network We do not need to calculate this by hand, but we do need to know what the edges are conceptually supposed to mean. --- ## Regularization: the intuitive version Regularization prevents over-fitting by penalizing complex models and tolerating misfits/noises. As one type of regularization for GGM, graphical lasso (.red[glasso]) is basically a trade-off by "punishing the model with too many connections": - Avoid an overly dense, noisy network - Fit the data **well enough** The final network depends partly on the estimation rule (the method and the levels of regularization): **larger and stable** edges vs. **weaker** edges --- ## Community (dimension) detection A very important psychometric move is this: If a network contains **clusters / communities** of nodes that are more strongly connected to each other, those communities may reflect **sub-dimensions**. This is the key intuition behind approaches such as **Exploratory Graph Analysis (EGA)**. So EGA takes network estimation and asks the similar question in EFA: **How many dimensions seem to be present, and which variables belong to each one?** --- class: inverse ### EGA in one sentence ## Estimate a network, detect communities, and treat those communities as candidate dimensions. --- ## How EGA connects to earlier weeks .pull-left[ ### EFA asked: - how many factors? - which variables load together? - what loading pattern best summarizes the covariance matrix? ] .pull-right[ ### EGA asks: - how many clusters? - which variables cluster together in the network? - how stable is this cluster structure? ] Both are exploratory structure-finding tools, but they start from different modeling assumptions. --- ## Package to try today: `EGAnet` There are multiple packages and approaches, here we only try the "EGAnet" package Useful things it can do: - Estimate EGA structures - Plot and evaluate communities (`EGA()`) - Stability such as bootstrap dimensionality (`bootEGA()`) - Inspect redundancy / local dependence (`UVA()`) For more information, please check their website: [.purple[EGAnet website]](https://r-ega.net/) --- ## In-class demo We will use `EFASimData.csv` as the main in-class demo dataset. .pull-left[ ``` r library(EGAnet) simDat <- read.csv("EFASimData.csv")[,-1] head(simDat) ``` ] .pull-right[ ``` F1 F2 F3 V1 V2 V3 S1 S2 S3 1 82 102 80 102 73 83 74 94 56 2 106 95 69 152 99 105 84 91 76 3 67 82 81 136 69 110 83 83 84 4 90 67 53 113 129 108 117 85 70 5 108 100 72 129 97 119 83 74 83 6 137 131 134 138 134 111 109 85 92 ``` ] --- ## Estimating the EGA solution ``` r ega_sim <- EGA(data = simDat, plot.EGA = FALSE) print(ega_sim) ``` .pull-left[ **Expected reading goals** - number of estimated dimensions - which items are grouped together - whether the solution makes sense ] .pull-right[ **Several terms** - Gamma (`\(\gamma\)`): tuning parameter for EBICglasso (common default 0.50) - Lambda(`\(\lambda\)`): Sparsity parameter for density (large values put more penalty to denser models) ] --- ## Estimating the EGA solution ``` Model: GLASSO (EBIC with gamma = 0.5) Correlations: auto Lambda: 0.0706596685755666 (n = 100, ratio = 0.1) Number of nodes: 9 Number of edges: 28 Edge density: 0.778 Non-zero edge weights: M SD Min Max 0.127 0.120 0.001 0.406 ---- Algorithm: Walktrap Number of communities: 3 F1 F2 F3 V1 V2 V3 S1 S2 S3 1 1 1 2 2 2 3 3 3 ---- Unidimensional Method: Louvain Unidimensional: No ---- TEFI: -4.221 ``` ??? I do not want to overcomplicate the first estimation with too many arguments. The first run should be as simple as possible so students can see the basic workflow. --- ### Visualizing the EGA structure ``` r plot(ega_sim) ``` <!-- --> --- ## Bootstrap EGA for stablity **What are considered here?** - Does the same number of dimensions keep reappearing? - Are some items unstable across replications? - Is the network structure robust or not? ``` r boot_sim <- bootEGA(data = simDat, iter = 100, type = "resampling", ncores = 4) ``` <!-- --> ``` r print(boot_sim) ``` ``` ## Model: GLASSO (EBIC) ## Correlations: auto ## Algorithm: Walktrap ## Unidimensional Method: Louvain ## ## ---- ## ## EGA Type: EGA ## Bootstrap Samples: 100 (Resampling) ## ## 3 ## Frequency: 1 ## ## Median dimensions: 3 [3, 3] 95% CI ``` ``` r dimensionStability(boot_sim, IS.plot = F) ``` ``` ## EGA Type: EGA ## Bootstrap Samples: 100 (Resampling) ## ## Proportion Replicated in Dimensions: ## ## F1 F2 F3 V1 V2 V3 S1 S2 S3 ## 1 1 1 1 1 1 1 1 1 ## ## ---- ## ## Structural Consistency: ## ## 1 2 3 ## 1 1 1 ``` ``` r itemStability(boot_sim, IS.plot = F) ``` ``` ## EGA Type: EGA ## Bootstrap Samples: 100 (Resampling) ## ## Proportion Replicated in Dimensions: ## ## F1 F2 F3 V1 V2 V3 S1 S2 S3 ## 1 1 1 1 1 1 1 1 1 ``` --- ## How to talk about an EGA result A good classroom interpretation sounds something like this: - "The network suggests **k** dimensions." - "Items A, B, C mainly belong to one community (what?); items D, E, F to another (what?)." - "This is an **exploratory** structure rather than a confirmatory theory-driven measurement model." - "Here are the information regarding the stability, redundancy, and substantive meaning." --- ## Correlations still matter here Even though the end product is a network plot, the analysis is still built from **associations among variables**. So your usual psychometric questions still matter: - Are the variables continuous or ordinal? - Are there many missing data? - Do some items look redundant? - Is there a strong wording-effect pattern? - Is sample size large enough for a stable structure? Network models do not remove the need for thoughtful data screening. --- ## Centrality measures ``` r library(qgraph) centralityPlot(ega_sim$network, scale = "raw0", include = c("Strength", "Closeness", "Betweenness")) ``` **Strength**: Sum of the weights of all edges for a node **Closeness**: Inverse of the sum shortest distances between a node and all other nodes - For weighted non-directional networks, the shortest distance is defined as the minimum sum of edge weights between two nodes **Betweenness**: Sum of the fraction of all shortest paths passing through a node --- ## Centrality measures .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] --- ## Other useful functions (?) Some potentally useful functions and features in the `EGAnet` package ``` r # UVA() # polychoric.matrix() # network.fit() # network.compare() # itemDiagnostics() # ... ``` --- class: inverse ### Beyond exploratory work ## There are also other approaches that do more "confirmatory" network models, including those in "psychonetrics" and "bootnet" packages --- ## Confirmatory testing Sometimes we do not only want an exploratory community solution. We may also want to ask: - Which edges should be freely estimated? - Which edges can be constrained to zero? - Does a sparse network fit adequately? - How do competing network structures compare? --- ## `psychonetrics` as an example This package, .red[psychonetrics], unifies SEM and psychometric network analysis which: - Estimate different types of models, such as GGMs and Ising models - Support a more confirmatory network thinking - Customized pruning / model search workflows - Extendible beyond simple cross-sectional networks --- #### A schematic representation of a network workflow to multivariate data From Borsboom, D., Deserno, M.K., Rhemtulla, M. et al. Network analysis of multivariate data in psychological science. Nat Rev Methods Primers 1, 58 (2021). https://doi.org/10.1038/s43586-021-00055-w  --- ## Lab Assignment 05: Network Modeling Music genre data from Lab Assignment 01 "PSYC5318Lab01Data.csv" - 11 items about people's liking of 11 types of music (Special codes for missing data) - Apply an EGA on this set of survey data - Estimate a weighted non-directional GGM network structure - Bootstrap and check the stability of items and comunities - Calculate and visualize the centrality measures (Strength, Closeness, and Betweenness) - Interpret the EGA results and compare them with the EFA results you had in Lab 01 - Write a report --- ### Extended due for Lab 05 ### Paper Critics Seminar 3 next week ### Class meeting for next week (A try of AI) ### Plans for the rest of the semester