我有一个表格,其日期为数字,每个日期都有一个值。现在我想添加另一列weekSum,其中包含上周的值总和。但是缺少一些日期(所以我不能总是使用当前和最后6行)。我的表看起来像这样:
df <- data.frame('date' = c(20160309, 20160310, 20160311, 20160312, 20160313, 20160314, 20160315, 20160317, 20160318, 20160319, 20160321), 'value' = c(1, 2, 3, 4, 5, 6, 7 ,8, 9, 10, 11))
date value
20160309 1
20160310 2
20160311 3
20160312 4
20160313 5
20160314 6
20160315 7
20160316 8
20160318 9 #17th skipped
20160319 10
20160321 11 #20th skipped
我想得到以下输出:
date value weekSum
20160309 1 NA
20160310 2 NA
20160311 3 NA
20160312 4 NA
20160313 5 NA
20160314 6 NA
20160315 7 28 # 1+2+3+4+5+6+7
20160316 8 35 # 2+3+4+5+6+7+8
20160318 9 39 # 4+5+6+7+8+9
20160319 10 45 # 5+6+7+8+9+10
20160321 11 45 # 7+8+9+10+11
如何做到这一点?
答案 0 :(得分:0)
使用基数R可以这样做:
res <- merge(df, data.frame(date = seq(df$date[1], to = df$date[length(d)], by = "days")), all.y = TRUE)
res$weekSum <- NA
for(i in seq_along(res$sum)[-seq_len(6)]){
res$weekSum[i] <- sum(res$value[(i - 6):i], na.rm = TRUE)
}
res <- res[!is.na(res$value), ]
res
# date value sum weekSum
#1 2016-03-09 1 NA NA
#2 2016-03-10 2 NA NA
#3 2016-03-11 3 NA NA
#4 2016-03-12 4 NA NA
#5 2016-03-13 5 NA NA
#6 2016-03-14 6 NA NA
#7 2016-03-15 7 28 28
#9 2016-03-17 8 33 35
#10 2016-03-18 9 39 42
#11 2016-03-19 10 45 49
#13 2016-03-21 11 45 56
答案 1 :(得分:0)
以下是使用tidyverse
工具的方法。此方法使用tidyr::complete
构造完整的日期序列,以便按照建议轻松获取当前行和前6行。如果有,请小心
NA
中的value
值开头,因为目前这些行将在结尾处被过滤掉。如有必要,可以调整以避免这种情况。
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
df <- data.frame('date' = c(20160309, 20160310, 20160311, 20160312, 20160313, 20160314, 20160315, 20160317, 20160318, 20160319, 20160321), 'value' = c(1, 2, 3, 4, 5, 6, 7 ,8, 9, 10, 11))
df %>%
mutate(date = ymd(date)) %>%
complete(date = seq.Date(min(date), max(date), by = 1)) %>%
arrange(date) %>%
mutate(
newval = replace_na(value, 0),
weekSum = newval + lag(newval) + lag(newval, 2) + lag(newval, 3) +
lag(newval, 4) + lag(newval, 5) + lag(newval, 6)
) %>%
select(-newval) %>%
filter(!is.na(value))
#> # A tibble: 11 x 3
#> date value weekSum
#> <date> <dbl> <dbl>
#> 1 2016-03-09 1. NA
#> 2 2016-03-10 2. NA
#> 3 2016-03-11 3. NA
#> 4 2016-03-12 4. NA
#> 5 2016-03-13 5. NA
#> 6 2016-03-14 6. NA
#> 7 2016-03-15 7. 28.
#> 8 2016-03-17 8. 33.
#> 9 2016-03-18 9. 39.
#> 10 2016-03-19 10. 45.
#> 11 2016-03-21 11. 45.
由reprex package(v0.2.0)创建于2018-05-07。