我有这个数据集
CASHPOINT_ID DT status QT_REC
1 N053360330 2016-01-01 end_of_day 5
2 N053360330 2016-01-01 end_of_day 2
3 N053360330 2016-01-02 before 9
4 N053360330 2016-01-02 before NA
5 N053360330 2016-01-03 end_of_day 16
6 N053360330 2016-01-03 end_of_day NA
我想只聚合不的行,其列状态标记为"之前"并保持对方不受影响。生成的数据集应该看起来像
CASHPOINT_ID DT status QT_REC
1 N053360330 2016-01-01 end_of_day 7
3 N053360330 2016-01-02 before 9
4 N053360330 2016-01-02 before NA
5 N053360330 2016-01-03 end_of_day 16
感谢。
答案 0 :(得分:2)
使用data.table
假设您的原始数据被称为dt
并且已经setDT()
,那么您可以这样做:
df <- rbind(
dt[status == "end_of_day", .(QT_REC = sum(QT_REC, na.rm = TRUE)),
by = .(CASHPOINT_ID, DT, status)],
dt[status != "end_of_day"]
)[order(DT)]
print(df)
CASHPOINT_ID DT status QT_REC
1: N053360330 2016-01-01 end_of_day 7
2: N053360330 2016-01-02 before 9
3: N053360330 2016-01-02 before NA
4: N053360330 2016-01-03 end_of_day 16
答案 1 :(得分:0)
这是使用dplyr的解决方案。
library(dplyr)
df %>%
group_by(floor_date(DT, "day"),status) %>%
summarise(QT_REC = sum(QT_REC, na.rm = T))
答案 2 :(得分:0)
另一个基于plyr
的解决方案:
ddply(.data = df,.variables = c('CASHPOINT_ID','DT','status'),
function(t){
if(t$status[1]!='before'){
unique(mutate(t,QT_REC=sum(QT_REC,na.rm=TRUE)))
}else{
t
}
})
# CASHPOINT_ID DT status QT_REC
#1 N053360330 2016-01-01 end_of_day 7
#2 N053360330 2016-01-02 before 9
#3 N053360330 2016-01-02 before NA
#4 N053360330 2016-01-03 end_of_day 16