Question

我有一个装满订单的DF。其中一些包含负数量，其原因是它们实际上是先前订单的取消。

问题，没有唯一的键可以帮助我找出哪个顺序对应于哪个取消。

因此，我建立了以下代码（“取消”是原始数据的子集，仅包含与...对应的行...取消...）：

for i, item in cancelations.iterrows(): 
    #find a row similar to the cancelation we are currently studying:        
    #We use item[1] to access second value of the tuple given back by iterrows()
    mask1 = (copy['CustomerID'] == item['CustomerID']) 
    mask2 = (copy['Quantity'] == item['Quantity'])
    mask3 = (copy['Description'] == item['Description'])   
    subset = copy[ mask1 & mask2 & mask3]
    if subset.shape[0] >0: #if we find one or several corresponding orders :
            print('possible corresponding orders:', subset.index.tolist())
            copy = copy.drop(subset.index.tolist()[0]) #retrieve only the first ot them from the copy of the data

因此，这可行，但是：首先，它永远需要运行。其次，我在某处读到，每当您发现自己编写复杂的代码来操作数据框时，就已经有了一种方法。所以也许你们当中有人知道可以帮助我的事情？

谢谢您的时间！

edit：请注意，有时候，我们有几个订单可能与即将发生的取消相对应。这就是为什么我不只在指定的某些列上使用drop_duplicates的原因...因为它消除了所有重复项（或除一个以外的所有重复项）：我只需要删除其中之一。

Python DataFrames：查找“几乎”相同的行

0 个答案: