删除具有特定条目的所有行

时间:2016-10-21 08:35:33

标签: r

数据:

DB <- data.frame(orderID = c(1,2,3,4,4,5,6,6,7,8),    
orderDate = c("1.1.12","1.1.12","1.1.12","13.1.12","13.1.12","12.1.12","10.1.12","10.1.12","21.1.12","24.1.12"),
itemID = c(2,3,2,5,12,4,2,3,1,5),   
customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1),
itemPrice = c(9.99, 14.99, 9.99, 19.99, 29.99, 4.99, 9.99, 14.99, 49.99, 19.99)
orderItemStatus = c(sold, sold, sold, refunded, sold, refunded, sold, refunded, sold, refunded))

预期结果:

DB <- data.frame(orderID = c(1,2,3,4,6,7),    
orderDate = c("1.1.12","1.1.12","1.1.12","13.1.12","10.1.12","21.1.12"),
itemID = c(2,3,2,12,2,1),   
customerID = c(1, 2, 3, 1, 2, 1,),
itemPrice = c(9.99, 14.99, 9.99, 29.99, 9.99, 49.99,)
orderItemStatus = c(sold, sold, sold, sold, sold, sold)

了解:

orderID是连续的。同一天customerID订购的产品会获得相同的orderID。当同一客户在另一天订购产品时,他/她是新的orderID

我想删除orderItemStatus = refunded的所有订单。我怎样才能做到这一点? (我认为这很简单,我发现Removing specific rows from a dataframe:但我不明白它是如何工作的 - 所以PLZ帮助我:()

- &GT;原始数据有大约500k行:所以plz提供的解决方案只需要很少的性能......

非常感谢您的支持!

1 个答案:

答案 0 :(得分:0)

以下代码应该完成这项工作:

DB_new <- DB[-which(DB$orderItemStatus == "refunded"), ]

which为您提供完成比较的指数。例如。使用DB[-c(1,5,10),],您可以删除第1,5和10项。您也可以分两步执行:

indices_to_remove <- which(DB$orderItemStatus == "refunded")
DB_new <- DB[-indices_to_remove, ]

@rosscova在评论中建议的另一种方法是找到所需的索引并将它们分配给结果。