控制输出格式为do

时间:2016-01-20 13:59:17

标签: r dplyr

以下两个do语句会产生稍微不同的结果:

library(dplyr)
set.seed(1)
d <- data.frame(x = rnorm(30), y = rnorm(30), w = factor(sample(3, 30, TRUE)))

(r1 <- d %>% group_by(w) %>%
   do(data.frame(s1 = sum(.$x),
                 s2 = sum(.$y),
                 s3 = {
                    z <- seq_along(.$x)
                    sum(z)
                 })))
# Source: local data frame [3 x 4]
# Groups: w [3]
# 
#        w        s1         s2    s3
#   (fctr)     (dbl)      (dbl) (int)
# 1      1 0.1292572  0.8447634    45
# 2      2 0.2092895  3.3060157    91
# 3      3 2.1351984 -0.1675416    36

(r2 <- d %>% group_by(w) %>%
   do(s1 = sum(.$x),
      s2 = sum(.$y),
      s3 = {
         z <- seq_along(.$x)
         sum(z)
      }))
# Source: local data frame [3 x 4]
# Groups: <by row>
# 
#        w       s1       s2       s3
#   (fctr)    (chr)    (chr)    (chr)
# 1      1 <dbl[1]> <dbl[1]> <int[1]>
# 2      2 <dbl[1]> <dbl[1]> <int[1]>
# 3      3 <dbl[1]> <dbl[1]> <int[1]>

如果我现在想要在输出中添加更复杂的对象,我必须依赖第二种形式:

(r3 <- d %>% group_by(w) %>%
   do(s1 = lm(y ~ x, .),
      s2 = sum(.$y),
      s3 = {
         z <- seq_along(.$x)
         sum(z)
      }))
# Source: local data frame [3 x 4]
# Groups: <by row>
# 
#        w      s1       s2       s3
#   (fctr)   (chr)    (chr)    (chr)
# 1      1 <S3:lm> <dbl[1]> <int[1]>
# 2      2 <S3:lm> <dbl[1]> <int[1]>
# 3      3 <S3:lm> <dbl[1]> <int[1]>

所以我的问题是,如果有一种优雅的方法可以将do的未命名形式的良好输出(特别是矢量存储为矢量而不是列表的矢量)与存储能力相结合do命名版本的更复杂的对象?期望的输出将是这样的,而不需要额外的mutate

r3 %>% mutate(s2 = unlist(s2), s3 = unlist(s3))
# Source: local data frame [3 x 4]
# Groups: <by row>
# 
#        w      s1         s2    s3
#   (fctr)   (chr)      (dbl) (int)
# 1      1 <S3:lm>  0.8447634    45
# 2      2 <S3:lm>  3.3060157    91
# 3      3 <S3:lm> -0.1675416    36

修改

此问题显然无效,因为在我目前的dplyr版本中,我得到的是list而不是chr

最后,为什么s1s2s3位于(chr)类型的第二个示例中?

1 个答案:

答案 0 :(得分:4)

将模型包裹在list中,并防止R尝试将其与I取消列出。

r3 <- d %>% group_by(w) %>%
    do(data.frame(s1 = I(list(lm(y ~ x, .))),
                  s2 = sum(.$y),
                  s3 = {
                     z <- seq_along(.$x)
                     sum(z)
                  }))

#Source: local data frame [3 x 4]
#Groups: w [3]

#       w      s1         s2    s3
#  (fctr)   (chr)      (dbl) (int)
#1      1 <S3:lm>  0.8447634    45
#2      2 <S3:lm>  3.3060157    91
#3      3 <S3:lm> -0.1675416    36

(打印类型chrprint.tbl_df中的错误,因为在dplyr 0.5中已修复。不用担心。)