查找不在一组值中的行(类似于SQL Except)

时间:2017-10-02 14:17:56

标签: python excel pandas dataframe

我要做的是删除多行Excel文件(使用pandas),然后将没有这些行的文件保存到.xlsx(使用pyexcelerate模块)。

我知道我可以通过删除它来删除数据帧的行(我已经开始工作了)。但是我在几篇文章中读到,当有很多(在我的情况下是> 5000)行应该被删除时,只需从数据帧中获取“删除”行的索引然后切片数据帧就快得多(就像例如SQL Except语句一样)。 不幸的是,即使我尝试了几种方法,我也无法让它工作。

以下是我的“来源帖子”:

Slice Pandas dataframe by labels that are not in a list - 来自用户ASGM的回答

How to drop a list of rows from Pandas dataframe? - 来自用户Dennis Golomazov的回答

这是函数的一部分,应该删除行并保存创建的文件:

for index, cell in enumerate(wb_in[header_xlsx]):
    if str(cell) in delete_set:
        set_to_delete.append(index)
        print str(cell) + " deleted from set: " + str(len(set_to_delete))
wb_out = Workbook()
data_out = wb_in.loc[set(wb_in.index) - set(set_to_delete)]
ws_out = wb_out.new_sheet('Main', data=data_out)
wb_out.save(file_path + filename + "_2.xlsx")

以下是数据框的示例:

               sku  product_group                      name  \
0  ABCDb00610-23.0           ABA1        Anti
1  ABCDb00610-10.0           ABA1        Anti
2   ABCDb00610-1.1           ABA1         Anti
3  ABCDb00609-23.0           ABA1         Anti
4  ABCDb00609-10.0           ABA1         Anti
5   ABCDb00609-1.1           ABA1         Anti
6  ABCDb00608-23.0           ABA1         Anti
7  ABCDb00608-10.0           ABA1         Anti
8   ABCDb00608-3.3           ABA1         Anti
9   ABCDb00608-3.0           ABA1         Anti

Delete_set是一个仅包含skus的集合(例如:ABCDb00608-3.3或ABCDb00609-1.1)。

顺便说一下:我尝试了很多解决方案建议!

提前致谢!

1 个答案:

答案 0 :(得分:1)

使用pd.Series.isin

df = df[~df.sku.isin(delete_set)]
print(df)
               sku product_group                   name
0  ABAAb00610-23.0          ABA1  Anti-Involucrin [SY5]
1  ABAAb00610-10.0          ABA1  Anti-Involucrin [SY5]
2   ABAAb00610-1.1          ABA1      Anti-EpCAM [AUA1]
3  ABAAb00609-23.0          ABA1      Anti-EpCAM [AUA1]
4  ABAAb00609-10.0          ABA1      Anti-EpCAM [AUA1]
5   ABAAb00609-1.1          ABA1      Anti-EpCAM [AUA1]
6  ABAAb00608-23.0          ABA1      Anti-EpCAM [AUA1]
7  ABAAb00608-10.0          ABA1      Anti-EpCAM [AUA1]
8   ABAAb00608-3.3          ABA1      Anti-EpCAM [AUA1]
9   ABAAb00608-3.0          ABA1      Anti-EpCAM [AUA1]

print(delete_set)
('ABAAb00608-3.3', 'ABAAb00609-1.1')

m = ~df.sku.isin(delete_set)
print(m) 
0     True
1     True
2     True
3     True
4     True
5    False
6     True
7     True
8    False
9     True
Name: sku, dtype: bool

print(df[m])
               sku product_group                   name
0  ABAAb00610-23.0          ABA1  Anti-Involucrin [SY5]
1  ABAAb00610-10.0          ABA1  Anti-Involucrin [SY5]
2   ABAAb00610-1.1          ABA1      Anti-EpCAM [AUA1]
3  ABAAb00609-23.0          ABA1      Anti-EpCAM [AUA1]
4  ABAAb00609-10.0          ABA1      Anti-EpCAM [AUA1]
6  ABAAb00608-23.0          ABA1      Anti-EpCAM [AUA1]
7  ABAAb00608-10.0          ABA1      Anti-EpCAM [AUA1]
9   ABAAb00608-3.0          ABA1      Anti-EpCAM [AUA1]