我当前的df如下所示:
WEEK COUNT COUNT2 PERCENTAGE
2017-53 10 15 .05
2018-00 5 10 .1
2018-01 7 9 .1
....
2018-52 10 12 .06
2019-00 6 10 .05
....
我想做的是将每年的最后两周合并为一年的最后一周,并合并COUNT, COUNT2, and PERCENTAGE
。我目前想合并的几周是:2017-53 and 2018-00
,2018-52 and 2019-00
,2019-52 and 2020-00
。我想合并到2017-53, 2018-52, 2019-52
中,我的预期输出如下:
WEEK COUNT COUNT2 PERCENTAGE
2017-53 15 25 .15
2018-01 7 9 .1
....
2018-52 16 22 .11
....
答案 0 :(得分:0)
使用tidyverse
,将该列的'WEEK'转换为Date
类,arrange
后,提取'year',并根据差异创建与'WEEK'的分组的“ year”的相邻元素,然后summarise
以获取sum
为“ COUNT”或“ PERCENTAGE”的列的matches
library(stringr)
library(lubridate)
library(dplyr) #1.0.0
df1 %>%
mutate(Date = as.Date(str_c(WEEK, "-01"), format = '%Y-%U-%w')) %>%
arrange(Date) %>%
mutate(year = year(Date)) %>%
group_by(WEEK = case_when(lag(year, default = first(year)) - year < 0 ~
lag(WEEK), TRUE ~ WEEK)) %>%
summarise(across(matches("COUNT|PERCENTAGE"), sum))
# A tibble: 3 x 4
# WEEK COUNT COUNT2 PERCENTAGE
# <chr> <int> <int> <dbl>
#1 2017-53 15 25 0.15
#2 2018-01 7 9 0.1
#3 2018-52 16 22 0.11
df1 <- structure(list(WEEK = c("2017-53", "2018-00", "2018-01", "2018-52",
"2019-00"), COUNT = c(10L, 5L, 7L, 10L, 6L), COUNT2 = c(15L,
10L, 9L, 12L, 10L), PERCENTAGE = c(0.05, 0.1, 0.1, 0.06, 0.05
)), class = "data.frame", row.names = c(NA, -5L))
答案 1 :(得分:0)