为什么predict()不接受我的lmer模型的'newdata'工作?

时间:2017-03-07 14:46:12

标签: r prediction lme4 mixed-models

当我尝试使用predict(),或predictInterval()或bootMer()时,通过给它newdata(即使它与我构建模型的数据相同),我得到以下错误:

Error in [.data.frame(fr, vars) : undefined columns selected

我一直在使用tidyverse包,所以我认为它可能与tibble有关,但转换为data.frame(使用as.data.frame)并不能解决问题。

我还尝试更改列名,以便它们不包含空格并删除数据中的NA。

以下是一个例子:

library(tidyverse)
library(lme4)
library(lattice)
library(merTools)

df = read_csv(file = "~/Documents/Experiments/Myles Study/mylesdata.csv")

names(df)[names(df) == "h3"] = "hr3"

spms = df %>%
  dplyr::select(
    -c(
      mets1, mets2, mets3, mets4, mets5, mets6
      , rpe1, rpe2, rpe3, rpe4, rpe5, rpe6
      , hr1, hr2, hr3, hr4, hr5, hr6
      , sl1, sl2, sl3, sl4, sl5, sl6
    )
  ) %>%
  gather(speed_lev, spm, spm1, spm2, spm3, spm4, spm5, spm6) 
spms$speed_lev = factor(spms$speed_lev)
levels(spms$speed_lev) = c("1.5", "2", "2.5", "3.5", "4.0", "4.5")


mets = df %>% 
  dplyr::select(
    -c(
      spm1, spm2, spm3, spm4, spm5, spm6
      , rpe1, rpe2, rpe3, rpe4, rpe5, rpe6
      , hr1, hr2, hr3, hr4, hr5, hr6
      , sl1, sl2, sl3, sl4, sl5, sl6
    )
  ) %>%
  gather(speed_lev, mets, mets1, mets2, mets3, mets4, mets5, mets6)
mets$speed_lev = factor(mets$speed_lev)
levels(mets$speed_lev) = c("1.5", "2", "2.5", "3.5", "4.0", "4.5")


rpe = df %>% 
  dplyr::select(
    -c(
      spm1, spm2, spm3, spm4, spm5, spm6
      , mets1, mets2, mets3, mets4, mets5, mets6
      , hr1, hr2, hr3, hr4, hr5, hr6
      , sl1, sl2, sl3, sl4, sl5, sl6
    )
  ) %>%
  gather(speed_lev, rpe, rpe1, rpe2, rpe3, rpe4, rpe5, rpe6)
rpe$speed_lev = factor(rpe$speed_lev)
levels(rpe$speed_lev) = c("1.5", "2", "2.5", "3.5", "4.0", "4.5")


hr = df %>% 
  dplyr::select(
    -c(
      spm1, spm2, spm3, spm4, spm5, spm6
      , mets1, mets2, mets3, mets4, mets5, mets6
      , rpe1, rpe2, rpe3, rpe4, rpe5, rpe6
      , sl1, sl2, sl3, sl4, sl5, sl6
    )
  ) %>%
  gather(speed_lev, hr, hr1, hr2, hr3, hr4, hr5, hr6)
hr$speed_lev = factor(hr$speed_lev)
levels(hr$speed_lev) = c("1.5", "2", "2.5", "3.5", "4.0", "4.5")


sl = df %>% 
  dplyr::select(
    -c(
      spm1, spm2, spm3, spm4, spm5, spm6
      , mets1, mets2, mets3, mets4, mets5, mets6
      , rpe1, rpe2, rpe3, rpe4, rpe5, rpe6
      , hr1, hr2, hr3, hr4, hr5, hr6
    )
  ) %>%
  gather(speed_lev, sl, sl1, sl2, sl3, sl4, sl5, sl6)
sl$speed_lev = factor(sl$speed_lev)
levels(sl$speed_lev) = c("1.5", "2", "2.5", "3.5", "4.0", "4.5")


dat = left_join(spms, mets)
dat = left_join(dat, rpe)
dat = left_join(dat, hr)
dat = left_join(dat, sl)
names(dat)[names(dat) == "speed_lev"] = "speed (m/h)"
dat$`speed (m/h)` = as.numeric(as.character(dat$`speed (m/h)`))

dat$mets_sq = dat$mets^2
dat$mets_cubed = dat$mets^3

m3 = lmer(spm ~  `leg length (cm)` + mets_cubed + mets_sq + mets + (1 | subject), data = dat)
summary(m3)

摘要是:

Linear mixed model fit by REML ['lmerMod']
Formula: spm ~ `leg length (cm)` + mets_cubed + mets_sq + mets + (1 |      subject)
   Data: dat

REML criterion at convergence: 1562.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.8094 -0.5213  0.1120  0.5638  3.0933 

Random effects:
 Groups   Name        Variance Std.Dev.
 subject  (Intercept) 49.41    7.029   
 Residual             26.90    5.186   
Number of obs: 238, groups:  subject, 43

Fixed effects:
                  Estimate Std. Error t value
(Intercept)       41.30017   18.13968   2.277
`leg length (cm)` -0.58471    0.16535  -3.536
mets_cubed         0.44777    0.09343   4.793
mets_sq           -8.67702    1.37655  -6.303
mets              60.66036    6.36756   9.526

Correlation of Fixed Effects:
            (Intr) `l(c)` mts_cb mts_sq
`lglng(cm)` -0.863                     
mets_cubed  -0.434 -0.045              
mets_sq      0.449  0.045 -0.994       
mets        -0.459 -0.045  0.974 -0.993

然后我尝试做出预测:

predict(m3, dat)

predictInterval(
  m3
  , dat
  , level = 0.95
  , n.sims = 1000
  , stat = "median"
  , type="linear.prediction"
  , include.resid.var = TRUE
  )

得到错误:

Error in `[.data.frame`(fr, vars) : undefined columns selected

Error in `[.data.frame`(fr, vars) : undefined columns selected
In addition: Warning message:
In predictInterval(m3, dat, level = 0.95, n.sims = 1000, stat = "median",  :
  newdata is tbl_df or tbl object from dplyr package and has been
              coerced to a data.frame

会话信息:

sessionInfo()

R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.3

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] merTools_0.3.0  arm_1.9-3       MASS_7.3-45     lattice_0.20-34 lme4_1.1-12     Matrix_1.2-7.1  dplyr_0.5.0    
 [8] purrr_0.2.2     readr_1.0.0     tidyr_0.6.0     tibble_1.2      ggplot2_2.1.0   tidyverse_1.0.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.6      nloptr_1.0.4     plyr_1.8.4       base64enc_0.1-3  tools_3.3.2      digest_0.6.10   
 [7] jsonlite_1.1     evaluate_0.10    blme_1.0-4       nlme_3.1-128     gtable_0.2.0     psych_1.6.9     
[13] shiny_1.0.0      DBI_0.5          yaml_2.1.14      parallel_3.3.2   mvtnorm_1.0-5    coda_0.18-1     
[19] stringr_1.1.0    knitr_1.15       htmlwidgets_0.8  grid_3.3.2       DT_0.2           R6_2.1.3        
[25] rmarkdown_1.1    foreign_0.8-67   minqa_1.2.4      reshape2_1.4.1   magrittr_1.5     scales_0.4.0    
[31] htmltools_0.3.5  splines_3.3.2    assertthat_0.1   abind_1.4-5      mnormt_1.5-5     xtable_1.8-2    
[37] mime_0.5         colorspace_1.2-6 httpuv_1.3.3     labeling_0.3     stringi_1.1.1    lazyeval_0.2.0  
[43] munsell_0.4.3    broom_0.4.1  

1 个答案:

答案 0 :(得分:1)

罗兰的怀疑是正确的。 tidyverse包(生态系统)掩盖了感兴趣的predict()函数。