我正在尝试根据以前的行的类型删除行。如果我的data.frame看起来像:
|Date |Time | Type | Gross | Sender_email | Receiver_email | |2018.07.12 |12:45:13 | Website Payment | 30 | aaa@customer.com | admin@site.com | |2018.07.21 |16:19:34 | Website Payment | 30 | bbb@customer.com | admin@site.com | |2018.07.22 |18:21:17 | Payment Refund | -30 | admin@site.com | bbb@custom.com | |2018.07.24 |07:10:00 | Website Payment | 30 | bbb@customer.com | admin@site.com | |2018.08.17 |15:17:40 | Website Payment | 30 | ccc@custom.com | admin@site.com |
我想删除退款的交易。
|Date |Time | Type | Gross | Sender_email | Receiver_email | |2018.07.12 |12:45:13 | Website Payment | 30 | aaa@customer.com | admin@site.com | |2018.07.24 |07:10:00 | Website Payment | 30 | bbb@customer.com | admin@site.com | |2018.08.17 |15:17:40 | Website Payment | 30 | ccc@custom.com | admin@site.com |
任何帮助将不胜感激!
答案 0 :(得分:0)
我们可以使用grep
i1 <- grep('Refund', df1$Type)
i2 <- c(i1, i1-1)
df1[setdiff(seq_len(nrow(df1)), i2),]
# Date Time Type Gross Sender_email Receiver_email
#1 2018.07.12 12:45:13 Website Payment 30 aaa@customer.com admin@site.com
#4 2018.07.24 07:10:00 Website Payment 30 bbb@@customer.com admin@site.com
#5 2018.08.17 15:17:40 Website Payment 30 ccc@custom.com admin@site.com
付款和退款之间是否还有其他界限
i1 <- grep('Refund', df1$Type)
out <- do.call(rbind, Map(function(i, j) {
x <- df1[i:j, ]
i2 <- grep('Website Payment', x$Type)
x[setdiff(rownames(x), c(j, i2)), ] }, c(1, i1[-length(i1)] + 1), i1))
答案 1 :(得分:0)
我有一个简单的解决方案,可能不够优雅和快速。 在您的示例中,您可以首先搜索退款发生地,然后查找退款人,最后删除这些行。 代码可能是这样的:
delete_refund=function(transaction_matrix){
#find in which row refund happens
index_refund=which(transaction_matrix[ , "Gross"]<0);
#find who receive refund
refunded=transaction_matrix[index_refund, "Receiver_email"];
#for each one refunds, find what they purchase before refund
all_refund_purchase=vector();
for (row in index_refund) {
one_purchase=which((transaction_matrix[1:row,"Gross"]==
abs(transaction_matrix[row,"Gross"])) &
(transaction_matrix[1:row,"Sender_email"]==
transaction_matrix[row,"Receiver_email"]));
#one may buy several things at the same value and refund part of them, so length of one_purchase may be greater than 1
one_purchase=one_purchase[!(one_purchase %in% all_refund_purchase)];
#one may has many refunds, record those which haven't been captured in all_refund_purchase
all_refund_purchase=c(all_refund_purchase,
one_purchase[length(one_purchase)])
#when some one bought several things at the same value
}
return(transaction_matrix[c(-index_refund, -all_refund_purchase), ]);
}
由于缺少数据样本,我在创建的一个简单示例中对其进行了测试。
df=data.frame(date=1:4, Gross=c(30,30,-30,30),
Sender_email=c('bbb@customer.com','ccc@customer.com',
'admin@site.com','bbb@customer.com'),
Receiver_email=c('admin@site.com','admin@site.com',
'bbb@customer.com','admin@site.com'),
stringsAsFactors = FALSE);
date Gross Sender_email Receiver_email
1 1 30 bbb@customer.com admin@site.com
2 2 30 ccc@customer.com admin@site.com
3 3 -30 admin@site.com bbb@customer.com
4 4 30 bbb@customer.com admin@site.com
结果是
date Gross Sender_email Receiver_email
2 2 30 ccc@customer.com admin@site.com
4 4 30 bbb@customer.com admin@site.com
满足张贴者的需求。