错误:无效的分组因子规范,主题

时间:2019-01-26 11:27:24

标签: r lme4

我正在尝试在R中运行多级Logistic回归模型。

这是我的模特:

modA <- glmer(Past_01 ~ Scene_01 + Speech_01 + (1| Subject), data = df2, 
        family = binomial, control = glmerControl(optimizer = "bobyqa"),
        nAGQ = 10)

运行它时,我收到以下信息:

错误:无效的分组因子说明,主题。

有人可以告诉我(a)此错误是什么意思?和/或(b)解决方案是什么?

虽然我注意到有人问过相同的问题,但我还没有找到一个能充分解决问题的答案。

这些是我到目前为止所采取的步骤,基于先前相关主题的建议。

  1. 在使用na.rm运行模型之前,我删除了NAs
  2. 我使用na.action = na.omit删除了NA。
  3. 我仔细检查了Subject(有问题的分组变量)的结构,这是一个因素。过去,场景和语音都是整数。

任何指导将不胜感激!

这是@NelsonGon要求的我的数据样本。

使用dput:

dput(head(df2,50))

我收到以下输出:

structure(list(Subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Trial = c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L), Spont = 
c(NA, NA, 3L, NA, NA, NA, NA, NA, NA, NA, 3L, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4L, NA, NA, NA, NA, NA, NA, 
NA, 5L, NA, NA, NA, NA, NA, NA, NA, 3L, NA, NA, NA, NA, NA, NA, NA), 
Speech_01 = c(NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, 
NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, 
NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, 
NA, NA), Scene_01 = c(NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 1L, 
NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, 
NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, 
NA, NA, NA, NA), VisElem_01 = c(NA, NA, NA, 0L, NA, NA, NA, NA, NA, 
NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, 
NA, 0L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 
1L, NA, NA, NA, NA, NA, NA), AudElem_01 = c(NA, NA, NA, NA, 0L, NA, 
NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, 
NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, 
NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA), Past_01 = c(NA, NA, NA, NA, 
NA, 0L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 
0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, 
NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA), Pres_01 = c(NA, NA, 
NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, 
NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, 
NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA), Fut_01 = 
c(NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, 
NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, 
NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA)), 
.Names = c("Subject", "Trial", "Spont", "Speech_01", "Scene_01", 
"VisElem_01", "AudElem_01", "Past_01", "Pres_01", "Fut_01"), 
row.names = c(NA, 50L), class = "data.frame")

这是使用str(df2)的数据结构。

'data.frame':   28800 obs. of  10 variables:
 $ Subject   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Trial     : int  1 1 1 1 1 1 1 1 2 2 ...
 $ Spont     : int  NA NA 3 NA NA NA NA NA NA NA ...
 $ Speech_01 : int  NA NA NA NA 0 NA NA NA NA NA ...
 $ Scene_01  : int  NA NA NA 1 NA NA NA NA NA NA ...
 $ VisElem_01: int  NA NA NA 0 NA NA NA NA NA NA ...
 $ AudElem_01: int  NA NA NA NA 0 NA NA NA NA NA ...
 $ Past_01   : int  NA NA NA NA NA 0 NA NA NA NA ...
 $ Pres_01   : int  NA NA NA NA NA NA 1 NA NA NA ...
 $ Fut_01    : int  NA NA NA NA NA NA NA 1 NA NA ...

数据类型 主题是参与者ID。 Spont是一个连续变量。其他变量(Scene_01,Speech_01,VisElem_01,AudElem_01,Past_01,Pres_01和Fut_01)是伪编码的,其值分别为1和0(分别是“是”和“否”)。在模型中,我编辑了变量以匹配数据框中的实际变量名称。

注意 NA是在虚拟编码步骤中产生的,在该步骤中,我使用了dyplr :: spread函数对现有列(来自原始数据集)从该现有列的唯一值中创建新列。如果您熟悉dyplr :: spread函数,您可能已经注意到它分散了数据并产生了一堆NA。

0 个答案:

没有答案