Question

我有一个看起来像这样的数据框

                                    Label                   Type  
Name                                                              
ppppp                         Base brute          UnweightedBase  
pbaaa                               Base                    Base  
pb4a1                      Très à gauche                Category 
pb4a2                           A gauche   pb4a2        Category  
pb4a3                          Au centre   pb4a3        Category  
pb4a4                           A droite   pb4a4        Category

如果＆＃34;输入＆＃34;列的值是＆＃34; UnweightedBase＆＃34;和＆＃34; Base＆＃34;，我想从数据中删除。

我可以使用以下代码一次只执行一项：

to_del = df[df['Type'] == "UnweightedBase"].index.tolist()

df= df.drop(to_del, axis)
return df

如何修改我的代码，以便我可以一次删除多个值？

我失败的尝试：

to_del = df[df['Type'] in ["UnweightedBase","Base"]].index.tolist()

df= df.drop(to_del, axis)
return df

Answer 1

您可以选择所需的行，并将结果DataFrame重新分配给df：

In [60]: df = df.loc[~df['Type'].isin(['UnweightedBase', 'Base'])]

In [61]: df
Out[61]: 
    Name              Label      Type
2  pb4a1      Très à gauche  Category
3  pb4a2   A gauche   pb4a2  Category
4  pb4a3  Au centre   pb4a3  Category
5  pb4a4   A droite   pb4a4  Category

我认为这比使用

更直接，更安全

to_del = df[df['Type'].isin(type_val)].index.tolist()
df= df.drop(to_del, axis)

因为后者与中间步骤的选择基本相同：

df[df['Type'].isin(type_val)]

此外，index.tolist()将返回索引标签。如果索引具有非唯一值，则可能会删除非预期的行。

例如：

In [85]: df = pd.read_table('data', sep='\s{4,}')

In [86]: df.index = ['a','b','c','d','e','a']

In [87]: df
Out[87]: 
    Name              Label            Type
a  ppppp         Base brute  UnweightedBase
b  pbaaa               Base            Base
c  pb4a1      Très à gauche        Category
d  pb4a2   A gauche   pb4a2        Category
e  pb4a3  Au centre   pb4a3        Category
a  pb4a4   A droite   pb4a4        Category  #<-- note the repeated index

In [88]: to_del = df[df['Type'].isin(['UnweightedBase', 'Base'])].index.tolist()

In [89]: to_del
Out[89]: ['a', 'b']

In [90]: df = df.drop(to_del)

In [91]: df
Out[91]: 
    Name              Label      Type
c  pb4a1      Très à gauche  Category
d  pb4a2   A gauche   pb4a2  Category
e  pb4a3  Au centre   pb4a3  Category
#<--- OOPs, we've lost the last row, even though the Type was Category.

删除多个Pandas DataFrame行，其中列值为this或that

1 个答案: