我正在尝试在R中运行多级Logistic回归模型。
这是我的模特:
modA <- glmer(Past_01 ~ Scene_01 + Speech_01 + (1| Subject), data = df2,
family = binomial, control = glmerControl(optimizer = "bobyqa"),
nAGQ = 10)
运行它时,我收到以下信息:
错误:无效的分组因子说明,主题。
有人可以告诉我(a)此错误是什么意思?和/或(b)解决方案是什么?
虽然我注意到有人问过相同的问题,但我还没有找到一个能充分解决问题的答案。
这些是我到目前为止所采取的步骤,基于先前相关主题的建议。
任何指导将不胜感激!
这是@NelsonGon要求的我的数据样本。
使用dput:
dput(head(df2,50))
我收到以下输出:
structure(list(Subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Trial = c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L), Spont =
c(NA, NA, 3L, NA, NA, NA, NA, NA, NA, NA, 3L, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4L, NA, NA, NA, NA, NA, NA,
NA, 5L, NA, NA, NA, NA, NA, NA, NA, 3L, NA, NA, NA, NA, NA, NA, NA),
Speech_01 = c(NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA,
NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA,
NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA,
NA, NA), Scene_01 = c(NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 1L,
NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 1L, NA,
NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA,
NA, NA, NA, NA), VisElem_01 = c(NA, NA, NA, 0L, NA, NA, NA, NA, NA,
NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA,
NA, 0L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA,
1L, NA, NA, NA, NA, NA, NA), AudElem_01 = c(NA, NA, NA, NA, 0L, NA,
NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA,
NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA,
NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA), Past_01 = c(NA, NA, NA, NA,
NA, 0L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA,
0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L,
NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA), Pres_01 = c(NA, NA,
NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA,
NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA,
NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA), Fut_01 =
c(NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 0L, NA,
NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA,
NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA)),
.Names = c("Subject", "Trial", "Spont", "Speech_01", "Scene_01",
"VisElem_01", "AudElem_01", "Past_01", "Pres_01", "Fut_01"),
row.names = c(NA, 50L), class = "data.frame")
这是使用str(df2)的数据结构。
'data.frame': 28800 obs. of 10 variables:
$ Subject : int 1 1 1 1 1 1 1 1 1 1 ...
$ Trial : int 1 1 1 1 1 1 1 1 2 2 ...
$ Spont : int NA NA 3 NA NA NA NA NA NA NA ...
$ Speech_01 : int NA NA NA NA 0 NA NA NA NA NA ...
$ Scene_01 : int NA NA NA 1 NA NA NA NA NA NA ...
$ VisElem_01: int NA NA NA 0 NA NA NA NA NA NA ...
$ AudElem_01: int NA NA NA NA 0 NA NA NA NA NA ...
$ Past_01 : int NA NA NA NA NA 0 NA NA NA NA ...
$ Pres_01 : int NA NA NA NA NA NA 1 NA NA NA ...
$ Fut_01 : int NA NA NA NA NA NA NA 1 NA NA ...
数据类型 主题是参与者ID。 Spont是一个连续变量。其他变量(Scene_01,Speech_01,VisElem_01,AudElem_01,Past_01,Pres_01和Fut_01)是伪编码的,其值分别为1和0(分别是“是”和“否”)。在模型中,我编辑了变量以匹配数据框中的实际变量名称。
注意 NA是在虚拟编码步骤中产生的,在该步骤中,我使用了dyplr :: spread函数对现有列(来自原始数据集)从该现有列的唯一值中创建新列。如果您熟悉dyplr :: spread函数,您可能已经注意到它分散了数据并产生了一堆NA。