dplyr在%>%运算符内重复

时间:2015-08-23 12:53:27

标签: r dplyr repeat rep

我正在尝试将repdplyr一起使用,但我不完全理解为什么我无法使其发挥作用。

我的数据看起来像这样。我想要的是简单地为每个dayweek重复n id

head(dt4)

   id  dayweek n
1  1   Friday 3
2  1   Monday 3
3  1 Saturday 3
4  1   Sunday 3
5  1 Thursday 3
6  1  Tuesday 3

我想要做的是在dplyr流程

cbind(rep(dt4$id, dt4$n), rep(as.character(dt4$dayweek), dt4$n) ) 

给出了

    [,1] [,2]    
[1,] "1"  "Friday"
[2,] "1"  "Friday"
[3,] "1"  "Friday"
[4,] "1"  "Monday"
[5,] "1"  "Monday"
[6,] "1"  "Monday"

我不明白为什么这段代码不起作用

dt4 %>% 
  group_by(id) %>% 
  summarise(rep(dayweek, n))

Error: expecting a single value

有人可以帮我吗?

数据

dt4 = structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), dayweek = structure(c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 
4L, 5L, 6L, 7L), .Label = c("Friday", "Monday", "Saturday", "Sunday", 
"Thursday", "Tuesday", "Wedesnday"), class = "factor"), n = c(3, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), class =     "data.frame", .Names = c("id", 
"dayweek", "n"), row.names = c(NA, -21L))

2 个答案:

答案 0 :(得分:5)

data.table可以作为此类操作的有用替代方法 - 我发现这更容易阅读:

library("data.table")
dt4 <- as.data.table(dt4)
head(dt4[, rep(dayweek, n), by=id], 10)

,并提供:

    id       V1
 1:  1   Friday
 2:  1   Friday
 3:  1   Friday
 4:  1   Monday
 5:  1   Monday
 6:  1   Monday
 7:  1 Saturday
 8:  1 Saturday
 9:  1 Saturday
10:  1   Sunday

答案 1 :(得分:3)

要获得与cbind相同的结果,我们可以使用do。正如@DavidArenburg所提到的,summarise每组组合输出一个值/行,而使用mutate我们得到的输出具有相同的行数。但是,在这里我们正在做一个可以在do环境中完成的不同操作。代码.表示数据集。如果我们想要提取“id”#39;来自dt4的列,我们可以使用dt4$iddt4[['id']]。将dt4替换为.

library(dplyr)
dt4 %>% 
    group_by(id) %>%
    do(data.frame(id=.$id, v1=rep(.$dayweek, .$n)))
#Source: local data frame [63 x 2]
#Groups: id

#  id       v1
#1   1   Friday
#2   1   Friday
#3   1   Friday
#4   1   Monday
#5   1   Monday
#6   1   Monday
#7   1 Saturday
#8   1 Saturday
#9   1 Saturday
#10  1   Sunday
#.. ..      ...

或者基于@Frank评论的另一个选项是指定从repslice生成的行索引和select我们需要保留的列。< / p>

dt4 %>%
     slice(rep(1:n(),n)) %>%
     select(-n)