我正在努力让我的每个id /年/月行都拥有与所有七个工作日相对应的所有行,以及“缺少工作日”的NA。
以下是数据框架和我尝试完成此任务:
> df
id year month weekday amount
1 1 2015 1 Friday 3650.43
2 2 2015 1 Monday 1271.12
3 1 2015 2 Friday 1315.79
4 2 2015 2 Monday 2195.37
> wday
weekday
1 Friday
2 Saturday
3 Wednesday
4 Sunday
5 Tuesday
6 Monday
7 Thursday
尝试使用group_by()和右连接。但是,它并没有产生我认为的那样。有没有一种简单的方法来实现我追求的结果?
> df <- df %>% group_by(id, year, month) %>% right_join(wday)
Joining by: "weekday"
> df
Source: local data frame [9 x 5]
Groups: id, year, month [?]
id year month weekday amount
(dbl) (int) (int) (chr) (dbl)
1 1 2015 1 Friday 3650.43
2 1 2015 2 Friday 1315.79
3 NA NA NA Saturday NA
4 NA NA NA Wednesday NA
5 NA NA NA Sunday NA
6 NA NA NA Tuesday NA
7 2 2015 1 Monday 1271.12
8 2 2015 2 Monday 2195.37
9 NA NA NA Thursday NA
我想要每个ID /年/月组合7行,其中缺少工作日的数量将是NA(或者理想地为零,但我知道如何通过mutate()得到它。)
生成的数据框应如下所示:
> df
id year month weekday amount
1 1 2015 1 Friday 3650.43
2 1 2015 1 Monday 0.00
3 1 2015 1 Saturday 0.00
4 1 2015 1 Sunday 0.00
5 1 2015 1 Thursday 0.00
6 1 2015 1 Tuesday 0.00
7 1 2015 1 Wednesday 0.00
8 1 2015 2 Friday 1315.79
9 1 2015 2 Monday 0.00
10 1 2015 2 Saturday 0.00
11 1 2015 2 Sunday 0.00
12 1 2015 2 Thursday 0.00
13 1 2015 2 Tuesday 0.00
14 1 2015 2 Wednesday 0.00
15 2 2015 1 Friday 0.00
16 2 2015 1 Monday 1271.12
17 2 2015 1 Saturday 0.00
18 2 2015 1 Sunday 0.00
19 2 2015 1 Thursday 0.00
20 2 2015 1 Tuesday 0.00
21 2 2015 1 Wednesday 0.00
22 2 2015 2 Friday 0.00
23 2 2015 2 Monday 2195.37
24 2 2015 2 Saturday 0.00
25 2 2015 2 Sunday 0.00
26 2 2015 2 Thursday 0.00
27 2 2015 2 Tuesday 0.00
28 2 2015 2 Wednesday 0.00
答案 0 :(得分:8)
我们可以使用expand.grid
expand.grid(c(lapply(df[1:3], unique), wday['weekday'])) %>%
left_join(., df) %>%
mutate(amount=replace(amount, is.na(amount), 0)) %>%
arrange(id, year, month, weekday)
# id year month weekday amount
#1 1 2015 1 Friday 3650.43
#2 1 2015 1 Monday 0.00
#3 1 2015 1 Saturday 0.00
#4 1 2015 1 Sunday 0.00
#5 1 2015 1 Thursday 0.00
#6 1 2015 1 Tuesday 0.00
#7 1 2015 1 Wednesday 0.00
#8 1 2015 2 Friday 1315.79
#9 1 2015 2 Monday 0.00
#10 1 2015 2 Saturday 0.00
#11 1 2015 2 Sunday 0.00
#12 1 2015 2 Thursday 0.00
#13 1 2015 2 Tuesday 0.00
#14 1 2015 2 Wednesday 0.00
#15 2 2015 1 Friday 0.00
#16 2 2015 1 Monday 1271.12
#17 2 2015 1 Saturday 0.00
#18 2 2015 1 Sunday 0.00
#19 2 2015 1 Thursday 0.00
#20 2 2015 1 Tuesday 0.00
#21 2 2015 1 Wednesday 0.00
#22 2 2015 2 Friday 0.00
#23 2 2015 2 Monday 2195.37
#24 2 2015 2 Saturday 0.00
#25 2 2015 2 Sunday 0.00
#26 2 2015 2 Thursday 0.00
#27 2 2015 2 Tuesday 0.00
#28 2 2015 2 Wednesday 0.00
答案 1 :(得分:4)
使用tidyr
和dplyr
。 complete
这里有繁重的工作 - 如果您已经在df的某个工作日使用,则不需要bind_rows
或na.omit
(或dplyr)。
library(dplyr)
library(tidyr)
df %>% #initial data
bind_rows(wday) %>% #adding on so we have all the weekdays
complete(id, year, month, weekday, #completing all levels of id:year:month:weekday
fill = list(amount = 0)) %>% #filling amount column with 0
na.omit() #remove the NAs we got from the bind_rows