数据:
DB <- data.frame(orderID = c(1,2,3,4,4,5,6,6,7,8),
orderDate = c("1.1.12","1.1.12","1.1.12","13.1.12","13.1.12","12.1.12","10.1.12","10.1.12","21.1.12","24.1.12"),
itemID = c(2,3,2,5,12,4,2,3,1,5),
customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1),
itemPrice = c(9.99, 14.99, 9.99, 19.99, 29.99, 4.99, 9.99, 14.99, 49.99, 19.99)
orderItemStatus = c(sold, sold, sold, refunded, sold, refunded, sold, refunded, sold, refunded))
预期结果:
DB <- data.frame(orderID = c(1,2,3,4,6,7),
orderDate = c("1.1.12","1.1.12","1.1.12","13.1.12","10.1.12","21.1.12"),
itemID = c(2,3,2,12,2,1),
customerID = c(1, 2, 3, 1, 2, 1,),
itemPrice = c(9.99, 14.99, 9.99, 29.99, 9.99, 49.99,)
orderItemStatus = c(sold, sold, sold, sold, sold, sold)
了解:
orderID
是连续的。同一天customerID
订购的产品会获得相同的orderID
。当同一客户在另一天订购产品时,他/她是新的orderID
。
我想删除orderItemStatus = refunded的所有订单。我怎样才能做到这一点? (我认为这很简单,我发现Removing specific rows from a dataframe:但我不明白它是如何工作的 - 所以PLZ帮助我:()
- &GT;原始数据有大约500k行:所以plz提供的解决方案只需要很少的性能......
非常感谢您的支持!
答案 0 :(得分:0)
以下代码应该完成这项工作:
DB_new <- DB[-which(DB$orderItemStatus == "refunded"), ]
which
为您提供完成比较的指数。例如。使用DB[-c(1,5,10),]
,您可以删除第1,5和10项。您也可以分两步执行:
indices_to_remove <- which(DB$orderItemStatus == "refunded")
DB_new <- DB[-indices_to_remove, ]
@rosscova在评论中建议的另一种方法是找到所需的索引并将它们分配给结果。