在给定周的第一天更换日期的更有效方法

时间:2016-12-19 13:26:15

标签: r datetime dataframe dplyr

从日期列表中,我有兴趣选择一周的第一天,并在下周的其余几天复制的值。我目前实现这一结果的方法是利用临时数据框,我有兴趣找到一个利用dplyr 的改进方法,而无需创建临时数据集。

问题

数据集

dta <- data.frame(origDate = seq(as.Date("01/01/2012", "%d/%m/%Y"),
                 as.Date("30/01/2012", "%d/%m/%Y"),
                 by = "day"))

临时数据集

我想摆脱这一步。

# Libraries
require(dplyr); require(lubridate); require(tidyr)
# Create interim data set
dtaIn  <- dta %>%
    mutate(weeknum = week(origDate)) %>%
    mutate(yearnum = year(origDate)) %>%
    unite(weekAndYear, yearnum, weeknum, sep = "_") %>%
    arrange(origDate) %>%
    group_by(weekAndYear) %>%
    filter(row_number() == 1) 

最终数据集

# Final data set
dtaFin <- dta %>%
    mutate(weeknum = week(origDate)) %>%
    mutate(yearnum = year(origDate)) %>%
    unite(weekAndYear, yearnum, weeknum, sep = "_") %>%
    left_join(y = dtaIn, by = c("weekAndYear" = "weekAndYear"))

结果

>> dtaFin
   origDate.x weekAndYear origDate.y
1  2012-01-01      2012_1 2012-01-01
2  2012-01-02      2012_1 2012-01-01
3  2012-01-03      2012_1 2012-01-01
4  2012-01-04      2012_1 2012-01-01
5  2012-01-05      2012_1 2012-01-01
6  2012-01-06      2012_1 2012-01-01
7  2012-01-07      2012_1 2012-01-01
8  2012-01-08      2012_2 2012-01-08
9  2012-01-09      2012_2 2012-01-08

结果返回一周的第一天复制。我们的任务是在没有创建dtain的情况下,无需离开正在进行的dplyr ,即可获得分析结果。在实践中,代码应该看起来

dtaFin <- dta %>%
   # Create variable for first day of each week
   # Replicate across rows for that week
   # Return data.frame of the sime sizes + 1 column with new day

可以从最终数据集中删除weekAndYear列;为了重现性,我离开了这里。

1 个答案:

答案 0 :(得分:3)

一个想法是使用strftime创建week和年份var,即

library(dplyr)
dta %>% 
  mutate(weekandyear = strftime(origDate+1, "%Y-%W")) %>% 
  group_by(weekandyear) %>% 
  mutate(origDate.y = head(origDate,1))

#     origDate weekandyear origDate.y
#       <date>       <chr>     <date>
#1  2012-01-01     2012-01 2012-01-01
#2  2012-01-02     2012-01 2012-01-01
#3  2012-01-03     2012-01 2012-01-01
#4  2012-01-04     2012-01 2012-01-01
#5  2012-01-05     2012-01 2012-01-01
#6  2012-01-06     2012-01 2012-01-01

请注意,不需要第一个mutate(如@akrun提及的那样),因此我们可以将其包含在group_by语句中,即

dta %>%
 group_by(weekandyear = strftime(origDate+1, "%Y-%W")) %>%
 mutate(origDate.y = head(origDate,1))