dplyr - 在group_by之后右键加入未产生所需/预期结果

时间:2015-12-20 15:05:33

标签: r dplyr

我正在努力让我的每个id /年/月行都拥有与所有七个工作日相对应的所有行,以及“缺少工作日”的NA。

以下是数据框架和我尝试完成此任务:

> df
  id year month weekday  amount
1  1 2015     1  Friday 3650.43
2  2 2015     1  Monday 1271.12
3  1 2015     2  Friday 1315.79
4  2 2015     2  Monday 2195.37
> wday
    weekday
1    Friday
2  Saturday
3 Wednesday
4    Sunday
5   Tuesday
6    Monday
7  Thursday

尝试使用group_by()和右连接。但是,它并没有产生我认为的那样。有没有一种简单的方法来实现我追求的结果?

> df <- df %>% group_by(id, year, month) %>% right_join(wday)
Joining by: "weekday"
> df
Source: local data frame [9 x 5]
Groups: id, year, month [?]

     id  year month   weekday  amount
  (dbl) (int) (int)     (chr)   (dbl)
1     1  2015     1    Friday 3650.43
2     1  2015     2    Friday 1315.79
3    NA    NA    NA  Saturday      NA
4    NA    NA    NA Wednesday      NA
5    NA    NA    NA    Sunday      NA
6    NA    NA    NA   Tuesday      NA
7     2  2015     1    Monday 1271.12
8     2  2015     2    Monday 2195.37
9    NA    NA    NA  Thursday      NA

我想要每个ID /年/月组合7行,其中缺少工作日的数量将是NA(或者理想地为零,但我知道如何通过mutate()得到它。)

生成的数据框应如下所示:

> df
   id year month   weekday  amount
1   1 2015     1    Friday 3650.43
2   1 2015     1    Monday    0.00
3   1 2015     1  Saturday    0.00
4   1 2015     1    Sunday    0.00
5   1 2015     1  Thursday    0.00
6   1 2015     1   Tuesday    0.00
7   1 2015     1 Wednesday    0.00
8   1 2015     2    Friday 1315.79
9   1 2015     2    Monday    0.00
10  1 2015     2  Saturday    0.00
11  1 2015     2    Sunday    0.00
12  1 2015     2  Thursday    0.00
13  1 2015     2   Tuesday    0.00
14  1 2015     2 Wednesday    0.00
15  2 2015     1    Friday    0.00
16  2 2015     1    Monday 1271.12
17  2 2015     1  Saturday    0.00
18  2 2015     1    Sunday    0.00
19  2 2015     1  Thursday    0.00
20  2 2015     1   Tuesday    0.00
21  2 2015     1 Wednesday    0.00
22  2 2015     2    Friday    0.00
23  2 2015     2    Monday 2195.37
24  2 2015     2  Saturday    0.00
25  2 2015     2    Sunday    0.00
26  2 2015     2  Thursday    0.00
27  2 2015     2   Tuesday    0.00
28  2 2015     2 Wednesday    0.00

2 个答案:

答案 0 :(得分:8)

我们可以使用expand.grid

expand.grid(c(lapply(df[1:3], unique), wday['weekday'])) %>% 
       left_join(., df) %>%
       mutate(amount=replace(amount, is.na(amount), 0)) %>% 
       arrange(id, year, month, weekday)
#    id year month   weekday  amount
#1   1 2015     1    Friday 3650.43
#2   1 2015     1    Monday    0.00
#3   1 2015     1  Saturday    0.00
#4   1 2015     1    Sunday    0.00
#5   1 2015     1  Thursday    0.00
#6   1 2015     1   Tuesday    0.00
#7   1 2015     1 Wednesday    0.00
#8   1 2015     2    Friday 1315.79
#9   1 2015     2    Monday    0.00
#10  1 2015     2  Saturday    0.00
#11  1 2015     2    Sunday    0.00
#12  1 2015     2  Thursday    0.00
#13  1 2015     2   Tuesday    0.00
#14  1 2015     2 Wednesday    0.00
#15  2 2015     1    Friday    0.00
#16  2 2015     1    Monday 1271.12
#17  2 2015     1  Saturday    0.00
#18  2 2015     1    Sunday    0.00
#19  2 2015     1  Thursday    0.00
#20  2 2015     1   Tuesday    0.00
#21  2 2015     1 Wednesday    0.00
#22  2 2015     2    Friday    0.00
#23  2 2015     2    Monday 2195.37
#24  2 2015     2  Saturday    0.00
#25  2 2015     2    Sunday    0.00
#26  2 2015     2  Thursday    0.00
#27  2 2015     2   Tuesday    0.00
#28  2 2015     2 Wednesday    0.00

答案 1 :(得分:4)

使用tidyrdplyrcomplete这里有繁重的工作 - 如果您已经在df的某个工作日使用,则不需要bind_rowsna.omit(或dplyr)。

library(dplyr)
library(tidyr)
df %>% #initial data
    bind_rows(wday) %>% #adding on so we have all the weekdays
    complete(id, year, month, weekday,  #completing all levels of id:year:month:weekday
                fill = list(amount = 0)) %>% #filling amount column with 0
    na.omit() #remove the NAs we got from the bind_rows