Intro to Generalized Mixed Effects Model in R

An Tutorial with Repeated Measures Data

Overview

The goal of this document is to introduce applications of R for Generalized Mixed Effects Modeling.

Package and Data Preparation

library(psych) # For descriptives and ANOVA, and else
library(ez) # For ANOVA
library(tidyverse) # This is a collection of packages for data wrangling and visualizing
library(Rmisc)
library(reshape2) # For reorganizing data
#library(lsr)
library(lme4) # For generalized linear mixed effect model
library(lmerTest) # For p values in generalized linear mixed effect model
#library(emmeans)
#library(dplyr)
#library(forcats)
library(DescTools)
#library(SuppDists)
library(effsize)
library(ggpubr)
#library(MVN)
library(r2glmm)

dat <- read.csv("Demo Data Das.csv") # Read in the csv data

Data forging

# A quick look at the first several rows of the data. This is the wide format in which each row contains all information from one individual.
head(dat)
##   X D1S D2S D3S D1D D2D D3D     Speed         WMC
## 1 1   6   6  10   3   4   2  52.57522  0.86260286
## 2 2  10   6   8   4   5   1  87.14825  0.41562595
## 3 3   8  10   9   5   2   3  65.23954 -1.74653932
## 4 4  10   9   9   3   2   3 106.95063 -0.03135097
## 5 5   7   9   9   3   3   4  89.83073  1.50946304
## 6 6   7   8   2   5   6   4 161.63888  0.75637916
# This function melt a wide-format data into long-formant in which each row contains information from one trial.
dat2 <- melt(dat, measure.vars = c("D1S","D2S","D3S", "D1D","D2D","D3D"), variable.name = "Condition", value.name = "Score")

# Spliting the string variable "condition" into two seperate (repeated measure) variables
dat3 <- separate(dat2, Condition, sep = 2, into = c("Factor1","Factor2"), remove = TRUE)

# I recoded the ID variable to a factor (for the ANOVA analyses, otherwise R will treat it as a DV)
dat3$X <- as.factor(dat3$X)
# Factor 1 has 3 levels and I took out the 3rd level (otherwise I will have to have 2 dummy-coded variables for Factor1 in regression).
dat3Final <-subset(dat3, Factor1 != "D3")

# Now the current data are formatted as a 'perfect' long format
head(dat3)
##   X     Speed         WMC Factor1 Factor2 Score
## 1 1  52.57522  0.86260286      D1       S     6
## 2 2  87.14825  0.41562595      D1       S    10
## 3 3  65.23954 -1.74653932      D1       S     8
## 4 4 106.95063 -0.03135097      D1       S    10
## 5 5  89.83073  1.50946304      D1       S     7
## 6 6 161.63888  0.75637916      D1       S     7

ANOVA with “ezANOVA” Package

This package gives 3 options for us to calculate Sums of Squares, and the following note is copied directly from their documentation:

Numeric value (either 1, 2 or 3) specifying the Sums of Squares “type” to employ when data are unbalanced (eg. when group sizes differ). type = 2 is the default because this will yield identical ANOVA results as type = 1 when data are balanced but type = 2 will additionally yield various assumption tests where appropriate. When data are unbalanced, users are warned that they should give special consideration to the value of type. type=3 will emulate the approach taken by popular commercial statistics packages like SAS and SPSS, but users are warned that this approach is not without criticism.

# Type 1
ezANOVA(dat3Final, dv = .(Score), wid = .(X), within = .(Factor1, Factor2), type = 1, return_aov = TRUE, detailed = TRUE)
## $ANOVA
##            Effect DFn DFd       SSn      SSd           F            p p<.05
## 1         Factor1   1  89   0.40000  94.1000   0.3783209 5.400727e-01      
## 2         Factor2   1  89 846.40000 530.1000 142.1045086 3.864254e-20     *
## 3 Factor1:Factor2   1  89  20.54444 133.9556  13.6497180 3.798565e-04     *
##           ges
## 1 0.000527318
## 2 0.527498096
## 3 0.026383003
## 
## $aov
## 
## Call:
## aov(formula = formula(aov_formula), data = data)
## 
## Grand Mean: 6.038889
## 
## Stratum 1: X
## 
## Terms:
##                 Residuals
## Sum of Squares   217.9556
## Deg. of Freedom        89
## 
## Residual standard error: 1.564909
## 
## Stratum 2: X:Factor1
## 
## Terms:
##                 Factor1 Residuals
## Sum of Squares      0.4      94.1
## Deg. of Freedom       1        89
## 
## Residual standard error: 1.028253
## 1 out of 2 effects not estimable
## Estimated effects are balanced
## 
## Stratum 3: X:Factor2
## 
## Terms:
##                 Factor2 Residuals
## Sum of Squares    846.4     530.1
## Deg. of Freedom       1        89
## 
## Residual standard error: 2.440529
## 1 out of 2 effects not estimable
## Estimated effects are balanced
## 
## Stratum 4: X:Factor1:Factor2
## 
## Terms:
##                 Factor1:Factor2 Residuals
## Sum of Squares         20.54444 133.95556
## Deg. of Freedom               1        89
## 
## Residual standard error: 1.226833
## Estimated effects are balanced
# Type 3
ezANOVA(dat3Final, dv = .(Score), wid = .(X), within = .(Factor1, Factor2), type = 3, return_aov = TRUE, detailed = TRUE)
## $ANOVA
##            Effect DFn DFd         SSn      SSd            F            p p<.05
## 1     (Intercept)   1  89 13128.54444 217.9556 5360.9115518 2.558857e-81     *
## 2         Factor1   1  89     0.40000  94.1000    0.3783209 5.400727e-01      
## 3         Factor2   1  89   846.40000 530.1000  142.1045086 3.864254e-20     *
## 4 Factor1:Factor2   1  89    20.54444 133.9556   13.6497180 3.798565e-04     *
##            ges
## 1 0.9307951118
## 2 0.0004096216
## 3 0.4644141782
## 4 0.0206133848
## 
## $aov
## 
## Call:
## aov(formula = formula(aov_formula), data = data)
## 
## Grand Mean: 6.038889
## 
## Stratum 1: X
## 
## Terms:
##                 Residuals
## Sum of Squares   217.9556
## Deg. of Freedom        89
## 
## Residual standard error: 1.564909
## 
## Stratum 2: X:Factor1
## 
## Terms:
##                 Factor1 Residuals
## Sum of Squares      0.4      94.1
## Deg. of Freedom       1        89
## 
## Residual standard error: 1.028253
## 1 out of 2 effects not estimable
## Estimated effects are balanced
## 
## Stratum 3: X:Factor2
## 
## Terms:
##                 Factor2 Residuals
## Sum of Squares    846.4     530.1
## Deg. of Freedom       1        89
## 
## Residual standard error: 2.440529
## 1 out of 2 effects not estimable
## Estimated effects are balanced
## 
## Stratum 4: X:Factor1:Factor2
## 
## Terms:
##                 Factor1:Factor2 Residuals
## Sum of Squares         20.54444 133.95556
## Deg. of Freedom               1        89
## 
## Residual standard error: 1.226833
## Estimated effects are balanced

ANOVA with “aov” Function

AnovaModel <- aov(Score ~ Factor1*Factor2 + Error(X/(Factor1*Factor2)), data = dat3)
summary(AnovaModel)
## 
## Error: X
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 89  255.7   2.873               
## 
## Error: X:Factor1
##            Df Sum Sq Mean Sq F value  Pr(>F)    
## Factor1     2  47.69  23.846   17.19 1.5e-07 ***
## Residuals 178 246.97   1.387                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: X:Factor2
##           Df Sum Sq Mean Sq F value Pr(>F)    
## Factor2    1 1822.3    1822   226.6 <2e-16 ***
## Residuals 89  715.7       8                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: X:Factor1:Factor2
##                  Df Sum Sq Mean Sq F value  Pr(>F)    
## Factor1:Factor2   2  120.2   60.08   36.64 4.7e-14 ***
## Residuals       178  291.8    1.64                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Effect sizes
EtaSq(AnovaModel, type = 1)
##                     eta.sq eta.sq.part eta.sq.gen
## Factor1         0.01362519   0.1618527 0.03061484
## Factor2         0.52062030   0.7180224 0.54684319
## Factor1:Factor2 0.03432802   0.2916487 0.07370411

ANOVA Plot

This is just a quick visualization of the condition differences.

DescribeSummary <- summarySE(dat3Final, measurevar = "Score", groupvars = c("Factor1","Factor2"))

pd = position_dodge(0.9)

ggplot(DescribeSummary, aes(x=Factor1, y=Score, fill=Factor2)) + 
  geom_errorbar(aes(ymin=Score-se, 
                    ymax=Score+se), 
                width=.2, size=1, position=pd) +
  geom_bar(position = "dodge", stat = "identity", alpha = 0.7) +
  coord_cartesian(ylim=c(2,9))+
  theme_classic() +
  scale_fill_grey(start = .1, end = .8) +
  theme(
    axis.title.y = element_text(vjust= 1.8),
    axis.title.x = element_text(vjust= -0.5),
    axis.title = element_text(face = "bold"))

Generalized Linear Mixed Effect Model

I just attached the scripts that I used in my own project, but I also just started to use GLMM so there is still a huge lot I am not 100% clear about the analysis.
Here I dummy-coded the two factors, and specify the random intercept without considering any individual level effect of the fectors (nothing except (1|X) in the “random term” in the formula).

This is some data I fictioned so here it seems the fitted model “lmmodel1” is singular: there might be too few variance in at least one effect, or it could also be a miss specification of the model.

# Dummy coding
dat3Final$F1Dummy <- dummy(dat3Final$Factor1,"D2")
dat3Final$F2Dummy <- dummy(dat3Final$Factor2,"S")

# Model specification and estimation
lmmodel1 <- lmer(Score ~ WMC*F1Dummy*F2Dummy + (1|X), data = dat3Final, REML = FALSE)
summary(lmmodel1)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: Score ~ WMC * F1Dummy * F2Dummy + (1 | X)
##    Data: dat3Final
## 
##      AIC      BIC   logLik deviance df.resid 
##   1395.6   1434.5   -687.8   1375.6      350 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3869 -0.7239  0.1256  0.7391  1.8144 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. 
##  X        (Intercept) 9.868e-19 9.934e-10
##  Residual             2.673e+00 1.635e+00
## Number of obs: 360, groups:  X, 90
## 
## Fixed effects:
##                      Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)           4.77778    0.17234 360.00000  27.722  < 2e-16 ***
## WMC                  -0.06142    0.21428 360.00000  -0.287  0.77458    
## F1Dummy              -0.54444    0.24373 360.00000  -2.234  0.02611 *  
## F2Dummy               2.58889    0.24373 360.00000  10.622  < 2e-16 ***
## WMC:F1Dummy           0.13748    0.30304 360.00000   0.454  0.65035    
## WMC:F2Dummy           0.35115    0.30304 360.00000   1.159  0.24733    
## F1Dummy:F2Dummy       0.95556    0.34469 360.00000   2.772  0.00586 ** 
## WMC:F1Dummy:F2Dummy  -0.04944    0.42857 360.00000  -0.115  0.90823    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) WMC    F1Dmmy F2Dmmy WMC:F1Dm WMC:F2 F1D:F2
## WMC          0.000                                            
## F1Dummy     -0.707  0.000                                     
## F2Dummy     -0.707  0.000  0.500                              
## WMC:F1Dummy  0.000 -0.707  0.000  0.000                       
## WMC:F2Dummy  0.000 -0.707  0.000  0.000  0.500                
## F1Dmmy:F2Dm  0.500  0.000 -0.707 -0.707  0.000    0.000       
## WMC:F1D:F2D  0.000  0.500  0.000  0.000 -0.707   -0.707  0.000
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
# Looking at the group effects
anova(lmmodel1, type = 3)
## Type III Analysis of Variance Table with Satterthwaite's method
##                      Sum Sq Mean Sq NumDF DenDF  F value    Pr(>F)    
## WMC                   0.220   0.220     1   360   0.0821  0.774576    
## F1Dummy              13.339  13.339     1   360   4.9898  0.026111 *  
## F2Dummy             301.606 301.606     1   360 112.8248 < 2.2e-16 ***
## WMC:F1Dummy           0.550   0.550     1   360   0.2058  0.650347    
## WMC:F2Dummy           3.589   3.589     1   360   1.3427  0.247334    
## F1Dummy:F2Dummy      20.544  20.544     1   360   7.6853  0.005857 ** 
## WMC:F1Dummy:F2Dummy   0.036   0.036     1   360   0.0133  0.908226    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Effect sizes. I googled and found this R2beta people report but I am still trying to understand what it really means.
r2beta(model = lmmodel1, partial = T)
##                Effect   Rsq upper.CL lower.CL
## 1               Model 0.487    0.555    0.426
## 4             F2Dummy 0.245    0.321    0.175
## 7     F1Dummy:F2Dummy 0.022    0.061    0.002
## 3             F1Dummy 0.014    0.049    0.000
## 6         WMC:F2Dummy 0.004    0.028    0.000
## 5         WMC:F1Dummy 0.001    0.017    0.000
## 2                 WMC 0.000    0.016    0.000
## 8 WMC:F1Dummy:F2Dummy 0.000    0.015    0.000

GLMM Visualization

I use these codes to visualize my GLMM data. It should be of help to break the conditions down and visualize the correlations beween the continous predictor and the outcome (performance) by unique conditions.

split_plot <- ggplot(aes(WMC, Score), data = dat3Final) + 
  geom_point() + 
  stat_smooth(method = "lm", col = "red", size = 2, alpha = 0.3) +
  facet_wrap(~ Factor1*Factor2) + 
  theme_classic2() + 
  xlab("WMC") + 
  ylab("Test score")
split_plot

Han Hao
Han Hao
Assistant Professor

Working memory, attention, intelligence, psychometrics, and R programming.

Related