无法通过filter()管道predict()输出到ggplot()

时间:2016-05-21 21:47:38

标签: r ggplot2 dplyr

我正在努力弄清楚为什么我不能在结果上使用filter() of forecast.gam()然后是ggplot()预测的子集。我不是 确定预测步骤确实是问题的一部分,但那是什么 它需要触发错误。只需filter() %>% ggplot()即可 数据框工作正常。

library(dplyr)
library(ggplot2)
library(mgcv)

gam1 <- gam(Petal.Length~s(Petal.Width) + Species, data=iris)

nd <- expand.grid(Petal.Width = seq(0,5,0.05),
                 Species = levels(iris$Species),
                 stringsAsFactors = FALSE)
predicted <- predict(gam1,newdata=nd)
predicted <- cbind(predicted,nd)
filter(tbl_df(predicted), Species == "setosa") %>%
  ggplot(aes(x=Petal.Width, y = predicted)) +
  geom_point()

## Error: length(rows) == 1 is not TRUE

可是:

filter(tbl_df(predicted), Species == "setosa")

## Source: local data frame [101 x 3]
## 
##    predicted Petal.Width Species
##    (dbl[10])       (dbl)   (chr)
## 1   1.294574        0.00  setosa
## 2   1.327482        0.05  setosa
## 3   1.360390        0.10  setosa
## 4   1.393365        0.15  setosa
## 5   1.426735        0.20  setosa
## 6   1.460927        0.25  setosa
## 7   1.496477        0.30  setosa
## 8   1.533949        0.35  setosa
## 9   1.573888        0.40  setosa
## 10  1.616810        0.45  setosa
## ..       ...         ...     ...

问题是filter()因为:

pick <- predicted$Species == "setosa"
ggplot(predicted[pick,],aes(x=Petal.Width, y = predicted)) +
  geom_point()

Nice plot of gam prediction

我还尝试将过滤器的结果保存到对象中并直接在ggplot()中使用,但具有相同的错误。

显然不是危机,因为这是一种解决方法,但我的心理 如何使用filter()的模型显然是错误的!任何见解都很多 赞赏。

编辑:当我第一次发布此消息时,我仍在使用R 3.2.3,并且正在收到来自ggplot2和dplyr的警告。所以我升级到3.3.0并且它仍在发生。

## R version 3.3.0 (2016-05-03)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 10586)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] mgcv_1.8-12   nlme_3.1-127  ggplot2_2.1.0 dplyr_0.4.3  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.3      knitr_1.11       magrittr_1.5     munsell_0.4.2   
##  [5] colorspace_1.2-6 lattice_0.20-33  R6_2.1.1         stringr_1.0.0   
##  [9] plyr_1.8.3       tools_3.3.0      parallel_3.3.0   grid_3.3.0      
## [13] gtable_0.1.2     DBI_0.3.1        htmltools_0.2.6  lazyeval_0.1.10 
## [17] yaml_2.1.13      assertthat_0.1   digest_0.6.8     Matrix_1.2-6    
## [21] formatR_1.2      evaluate_0.7.2   rmarkdown_0.9.5  labeling_0.3    
## [25] stringi_1.0-1    scales_0.3.0

1 个答案:

答案 0 :(得分:4)

问题出现是因为您的predict()调用生成了一个命名数组,而不仅仅是一个数字向量。

class(predicted$predicted)
# [1] "array"

第一个filter()将在表面上为您提供正确的输出,但是如果您检查输出,您会注意到列predicted仍然是某种嵌套数组。

str(filter(tbl_df(predicted), Species == "setosa"))
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   101 obs. of  3 variables:
 $ predicted  : num [1:303(1d)] 1.29 1.33 1.36 1.39 1.43 ...  
  ..- attr(*, "dimnames")=List of 1
  .. ..$ : chr  "1" "2" "3" "4" ...
 $ Petal.Width: num  0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 ...
 $ Species: chr  "setosa" "setosa" "setosa" "setosa" ...

相比之下,良好的旧逻辑子集可以在所有维度上完成工作:

str(predicted[pick,])
'data.frame':   101 obs. of  3 variables:
 $ predicted  : num [1:101(1d)] 1.29 1.33 1.36 1.39 1.43 ... # Now 101 obs here too
  ..- attr(*, "dimnames")=List of 1
  .. ..$ : chr  "1" "2" "3" "4" ...
 $ Petal.Width: num  0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 ...
 $ Species    : chr  "setosa" "setosa" "setosa" "setosa" ...

因此,您要么将predicted列强制转换为数字:

library(dplyr)
library(ggplot2)

predicted %>% mutate(predicted = as.numeric(predicted)) %>% 
  filter(Species == "setosa") %>%
  ggplot(aes(x = Petal.Width, y = predicted)) +
  geom_point()

或者filter()替换subset()

predicted %>% 
  subset(Species == "setosa") %>%
  ggplot(aes(x = Petal.Width, y = predicted)) +
  geom_point()