我有一个如下所示的数据集(x):
DATE WEEKDAY A B C D
2011-02-04 Friday 113 67 109 72
2011-02-05 Saturday 1 0 0 1
2011-02-06 Sunday 9 5 0 0
2011-02-07 Monday 154 48 85 60
STR(X):
'data.frame': 4 obs. of 6 variables:
$ DATE : Date, format: "2011-02-04" "2011-02-05" "2011-02-06" "2011-02-07"
$ WEEKDAY: Factor w/ 7 levels "Friday","Monday",..: 1 3 4 2
$ A : num 113 1 9 154
$ B : num 67 0 5 48
$ C : num 109 0 0 85
$ D : num 72 1 0 60
星期二 - 星期六价值不会改变,但我希望星期日是星期六,星期日和星期一的总和,是星期六,星期日和星期一的总和。
我尝试将星期六和星期日的日期分别改为日期+ 2和日期+ 1,然后按日期汇总,但我丢失了周末记录。
对于我的例子,正确的结果如下:
DATE WEEKDAY A B C D
2011-02-04 Friday 113 67 109 72
2011-02-05 Saturday 1 0 0 1
2011-02-06 Sunday 10 5 0 1
2011-02-07 Monday 164 53 85 61
如何将周末值汇总到第二天?
三周的数据:
DATE WEEKDAY A B C D
1 2011-01-02 Sunday 2 1 0 0
2 2011-01-03 Monday 153 51 7 1
3 2011-01-04 Tuesday 182 103 13 5
4 2011-01-05 Wednesday 192 102 14 12
5 2011-01-06 Thursday 160 67 50 20
6 2011-01-07 Friday 154 96 50 39
7 2011-01-09 Sunday 0 0 0 1
8 2011-01-10 Monday 195 94 48 39
9 2011-01-11 Tuesday 206 72 71 38
10 2011-01-12 Wednesday 232 94 96 52
11 2011-01-13 Thursday 178 113 93 52
12 2011-01-14 Friday 173 97 68 56
13 2011-01-15 Saturday 2 0 1 0
14 2011-01-17 Monday 170 91 66 52
15 2011-01-18 Tuesday 176 76 70 78
16 2011-01-19 Wednesday 164 159 117 37
17 2011-01-20 Thursday 198 87 95 111
18 2011-01-21 Friday 213 86 89 90
19 2011-01-24 Monday 195 73 102 52
20 2011-01-25 Tuesday 193 108 116 70
21 2011-01-26 Wednesday 193 102 118 63
答案 0 :(得分:3)
由于您提供了一些小数据,我无法在更大的数据上进行测试。但这个想法是这样的。我将使用data.table
,因为我发现它在这里非常有效。
require(data.table)
my_days <- c("Saturday", "Sunday", "Monday")
dt <- data.table(df)
dt[, `:=`(DATE = as.Date(DATE))]
setkey(dt, "DATE")
dt[WEEKDAY %in% my_days, `:=`(A = cumsum(A), B = cumsum(B),
C = cumsum(C), D = cumsum(D)), by = format(DATE-1, "%W")]
DATE
(第4行)将Date
列更改为实际as.Date
类型。DATE
的键列设置为dt
(第5行),确保列按DATE
列排序。WEEKDAY %in% my_days,
的第一部分将data.table
dt设置为只有days = Sat, Sun or Mon
。 by = format(DATE-1, "%W")
的最后一部分,按照所属的一周对数据进行子集。在这里,由于Monday
会在下周出现,只需从当前日期中减去1,然后获取周数。这会将日期分组为Week
,其中,星期二到星期一应该是同一周。':='(A = ... , D = ...)
中的表达式计算cumsum
,并按引用替换每个分组的值。对于你发布的新数据,我得到了这个结果。如果它不你想要的东西,请告诉我。
# DATE WEEKDAY A B C D
# 1: 2011-01-02 Sunday 2 1 0 0
# 2: 2011-01-03 Monday 155 52 7 1
# 3: 2011-01-04 Tuesday 182 103 13 5
# 4: 2011-01-05 Wednesday 192 102 14 12
# 5: 2011-01-06 Thursday 160 67 50 20
# 6: 2011-01-07 Friday 154 96 50 39
# 7: 2011-01-09 Sunday 0 0 0 1
# 8: 2011-01-10 Monday 195 94 48 40
# 9: 2011-01-11 Tuesday 206 72 71 38
# 10: 2011-01-12 Wednesday 232 94 96 52
# 11: 2011-01-13 Thursday 178 113 93 52
# 12: 2011-01-14 Friday 173 97 68 56
# 13: 2011-01-15 Saturday 2 0 1 0
# 14: 2011-01-17 Monday 172 91 67 52
# 15: 2011-01-18 Tuesday 176 76 70 78
# 16: 2011-01-19 Wednesday 164 159 117 37
# 17: 2011-01-20 Thursday 198 87 95 111
# 18: 2011-01-21 Friday 213 86 89 90
# 19: 2011-01-24 Monday 195 73 102 52
# 20: 2011-01-25 Tuesday 193 108 116 70
# 21: 2011-01-26 Wednesday 193 102 118 63
# DATE WEEKDAY A B C D