我要做的是删除多行Excel文件(使用pandas),然后将没有这些行的文件保存到.xlsx(使用pyexcelerate模块)。
我知道我可以通过删除它来删除数据帧的行(我已经开始工作了)。但是我在几篇文章中读到,当有很多(在我的情况下是> 5000)行应该被删除时,只需从数据帧中获取“删除”行的索引然后切片数据帧就快得多(就像例如SQL Except语句一样)。 不幸的是,即使我尝试了几种方法,我也无法让它工作。
以下是我的“来源帖子”:
Slice Pandas dataframe by labels that are not in a list - 来自用户ASGM的回答
How to drop a list of rows from Pandas dataframe? - 来自用户Dennis Golomazov的回答
这是函数的一部分,应该删除行并保存创建的文件:
for index, cell in enumerate(wb_in[header_xlsx]):
if str(cell) in delete_set:
set_to_delete.append(index)
print str(cell) + " deleted from set: " + str(len(set_to_delete))
wb_out = Workbook()
data_out = wb_in.loc[set(wb_in.index) - set(set_to_delete)]
ws_out = wb_out.new_sheet('Main', data=data_out)
wb_out.save(file_path + filename + "_2.xlsx")
以下是数据框的示例:
sku product_group name \
0 ABCDb00610-23.0 ABA1 Anti
1 ABCDb00610-10.0 ABA1 Anti
2 ABCDb00610-1.1 ABA1 Anti
3 ABCDb00609-23.0 ABA1 Anti
4 ABCDb00609-10.0 ABA1 Anti
5 ABCDb00609-1.1 ABA1 Anti
6 ABCDb00608-23.0 ABA1 Anti
7 ABCDb00608-10.0 ABA1 Anti
8 ABCDb00608-3.3 ABA1 Anti
9 ABCDb00608-3.0 ABA1 Anti
Delete_set是一个仅包含skus的集合(例如:ABCDb00608-3.3或ABCDb00609-1.1)。
顺便说一下:我尝试了很多解决方案建议!提前致谢!
答案 0 :(得分:1)
使用pd.Series.isin
:
df = df[~df.sku.isin(delete_set)]
print(df)
sku product_group name
0 ABAAb00610-23.0 ABA1 Anti-Involucrin [SY5]
1 ABAAb00610-10.0 ABA1 Anti-Involucrin [SY5]
2 ABAAb00610-1.1 ABA1 Anti-EpCAM [AUA1]
3 ABAAb00609-23.0 ABA1 Anti-EpCAM [AUA1]
4 ABAAb00609-10.0 ABA1 Anti-EpCAM [AUA1]
5 ABAAb00609-1.1 ABA1 Anti-EpCAM [AUA1]
6 ABAAb00608-23.0 ABA1 Anti-EpCAM [AUA1]
7 ABAAb00608-10.0 ABA1 Anti-EpCAM [AUA1]
8 ABAAb00608-3.3 ABA1 Anti-EpCAM [AUA1]
9 ABAAb00608-3.0 ABA1 Anti-EpCAM [AUA1]
print(delete_set)
('ABAAb00608-3.3', 'ABAAb00609-1.1')
m = ~df.sku.isin(delete_set)
print(m)
0 True
1 True
2 True
3 True
4 True
5 False
6 True
7 True
8 False
9 True
Name: sku, dtype: bool
print(df[m])
sku product_group name
0 ABAAb00610-23.0 ABA1 Anti-Involucrin [SY5]
1 ABAAb00610-10.0 ABA1 Anti-Involucrin [SY5]
2 ABAAb00610-1.1 ABA1 Anti-EpCAM [AUA1]
3 ABAAb00609-23.0 ABA1 Anti-EpCAM [AUA1]
4 ABAAb00609-10.0 ABA1 Anti-EpCAM [AUA1]
6 ABAAb00608-23.0 ABA1 Anti-EpCAM [AUA1]
7 ABAAb00608-10.0 ABA1 Anti-EpCAM [AUA1]
9 ABAAb00608-3.0 ABA1 Anti-EpCAM [AUA1]