从日期列表中,我有兴趣选择一周的第一天,并在下周的其余几天复制的值。我目前实现这一结果的方法是利用临时数据框,我有兴趣找到一个利用dplyr pipeline的改进方法,而无需创建临时数据集。
dta <- data.frame(origDate = seq(as.Date("01/01/2012", "%d/%m/%Y"),
as.Date("30/01/2012", "%d/%m/%Y"),
by = "day"))
我想摆脱这一步。
# Libraries
require(dplyr); require(lubridate); require(tidyr)
# Create interim data set
dtaIn <- dta %>%
mutate(weeknum = week(origDate)) %>%
mutate(yearnum = year(origDate)) %>%
unite(weekAndYear, yearnum, weeknum, sep = "_") %>%
arrange(origDate) %>%
group_by(weekAndYear) %>%
filter(row_number() == 1)
# Final data set
dtaFin <- dta %>%
mutate(weeknum = week(origDate)) %>%
mutate(yearnum = year(origDate)) %>%
unite(weekAndYear, yearnum, weeknum, sep = "_") %>%
left_join(y = dtaIn, by = c("weekAndYear" = "weekAndYear"))
>> dtaFin
origDate.x weekAndYear origDate.y
1 2012-01-01 2012_1 2012-01-01
2 2012-01-02 2012_1 2012-01-01
3 2012-01-03 2012_1 2012-01-01
4 2012-01-04 2012_1 2012-01-01
5 2012-01-05 2012_1 2012-01-01
6 2012-01-06 2012_1 2012-01-01
7 2012-01-07 2012_1 2012-01-01
8 2012-01-08 2012_2 2012-01-08
9 2012-01-09 2012_2 2012-01-08
结果返回一周的第一天复制。我们的任务是在没有创建dtain
和的情况下,无需离开正在进行的dplyr
pipeline,即可获得分析结果。在实践中,代码应该看起来
dtaFin <- dta %>%
# Create variable for first day of each week
# Replicate across rows for that week
# Return data.frame of the sime sizes + 1 column with new day
可以从最终数据集中删除weekAndYear
列;为了重现性,我离开了这里。
答案 0 :(得分:3)
一个想法是使用strftime
创建week和年份var,即
library(dplyr)
dta %>%
mutate(weekandyear = strftime(origDate+1, "%Y-%W")) %>%
group_by(weekandyear) %>%
mutate(origDate.y = head(origDate,1))
# origDate weekandyear origDate.y
# <date> <chr> <date>
#1 2012-01-01 2012-01 2012-01-01
#2 2012-01-02 2012-01 2012-01-01
#3 2012-01-03 2012-01 2012-01-01
#4 2012-01-04 2012-01 2012-01-01
#5 2012-01-05 2012-01 2012-01-01
#6 2012-01-06 2012-01 2012-01-01
请注意,不需要第一个mutate
(如@akrun提及的那样),因此我们可以将其包含在group_by
语句中,即
dta %>%
group_by(weekandyear = strftime(origDate+1, "%Y-%W")) %>%
mutate(origDate.y = head(origDate,1))