根据先前行的类型删除行

时间:2019-01-02 06:58:47

标签: r dataframe

我正在尝试根据以前的行的类型删除行。如果我的data.frame看起来像:

|Date       |Time     | Type            | Gross | Sender_email      |  Receiver_email |
|2018.07.12 |12:45:13 | Website Payment | 30    | aaa@customer.com  |  admin@site.com |
|2018.07.21 |16:19:34 | Website Payment | 30    | bbb@customer.com  |  admin@site.com |
|2018.07.22 |18:21:17 | Payment Refund  | -30   | admin@site.com    |  bbb@custom.com |
|2018.07.24 |07:10:00 | Website Payment | 30    | bbb@customer.com  |  admin@site.com |
|2018.08.17 |15:17:40 | Website Payment | 30    | ccc@custom.com    |  admin@site.com |

我想删除退款的交易。

|Date       |Time     | Type            | Gross | Sender_email      |  Receiver_email |
|2018.07.12 |12:45:13 | Website Payment | 30    | aaa@customer.com  |  admin@site.com |
|2018.07.24 |07:10:00 | Website Payment | 30    | bbb@customer.com  |  admin@site.com |
|2018.08.17 |15:17:40 | Website Payment | 30    | ccc@custom.com    |  admin@site.com |

任何帮助将不胜感激!

2 个答案:

答案 0 :(得分:0)

我们可以使用grep

i1 <- grep('Refund', df1$Type)
i2 <- c(i1, i1-1)
df1[setdiff(seq_len(nrow(df1)), i2),]
#        Date     Time            Type Gross      Sender_email Receiver_email
#1 2018.07.12 12:45:13 Website Payment    30  aaa@customer.com admin@site.com
#4 2018.07.24 07:10:00 Website Payment    30 bbb@@customer.com admin@site.com
#5 2018.08.17 15:17:40 Website Payment    30    ccc@custom.com admin@site.com

更新

付款和退款之间是否还有其他界限

i1 <- grep('Refund', df1$Type)
out <- do.call(rbind, Map(function(i, j) {
       x <- df1[i:j, ]
       i2 <- grep('Website Payment', x$Type)
       x[setdiff(rownames(x), c(j, i2)), ] }, c(1, i1[-length(i1)] + 1), i1))

答案 1 :(得分:0)

我有一个简单的解决方案,可能不够优雅和快速。 在您的示例中,您可以首先搜索退款发生地,然后查找退款人,最后删除这些行。 代码可能是这样的:

delete_refund=function(transaction_matrix){

  #find in which row refund happens
  index_refund=which(transaction_matrix[ , "Gross"]<0);

  #find who receive refund
  refunded=transaction_matrix[index_refund, "Receiver_email"];

  #for each one refunds, find what they purchase before refund
  all_refund_purchase=vector();
  for (row in index_refund) {
    one_purchase=which((transaction_matrix[1:row,"Gross"]==
      abs(transaction_matrix[row,"Gross"])) &                
      (transaction_matrix[1:row,"Sender_email"]==
      transaction_matrix[row,"Receiver_email"]));
    #one may buy several things at the same value and refund part of them, so length of one_purchase may be greater than 1
    one_purchase=one_purchase[!(one_purchase %in% all_refund_purchase)];
    #one may has many refunds, record those which haven't been captured in all_refund_purchase
    all_refund_purchase=c(all_refund_purchase, 
      one_purchase[length(one_purchase)])
    #when some one bought several things at the same value
  }

  return(transaction_matrix[c(-index_refund, -all_refund_purchase), ]);
}

由于缺少数据样本,我在创建的一个简单示例中对其进行了测试。

df=data.frame(date=1:4, Gross=c(30,30,-30,30), 
    Sender_email=c('bbb@customer.com','ccc@customer.com',
      'admin@site.com','bbb@customer.com'),
    Receiver_email=c('admin@site.com','admin@site.com',
      'bbb@customer.com','admin@site.com'), 
    stringsAsFactors = FALSE);

  date Gross     Sender_email   Receiver_email
1    1    30 bbb@customer.com   admin@site.com
2    2    30 ccc@customer.com   admin@site.com
3    3   -30   admin@site.com bbb@customer.com
4    4    30 bbb@customer.com   admin@site.com

结果是

  date Gross     Sender_email Receiver_email
2    2    30 ccc@customer.com admin@site.com
4    4    30 bbb@customer.com admin@site.com

满足张贴者的需求。