Question

我有一个熊猫数据框

pd_sequences
Out[3]: 
         0   1   2    3    4  5  occurence  unique      dist
0       58  68  58   59   -1 -1          5       3  0.030624
1       59  69  59   58   -1 -1         15       3  0.026485
2       93  94  93   33   -1 -1         10       3  0.137149
3       58  59  58   68   -1 -1          8       3  0.028127
4       92  94  92   33   -1 -1          4       3  0.155580
5       59  58  59   69   -1 -1         10       3  0.026057

前6个列的名称分别为0、1、2、3、4、5

如果列0到5中的任何一个包含数字100或101，我想删除该数据框中的所有行。

对于一个简单的列：

#remove 100
pd_sequences.drop(pd_sequences[pd_sequences['0'] == 100].index, inplace=True)

然后

#remove 101
pd_sequences.drop(pd_sequences[pd_sequences['0'] == 101].index, inplace=True)

在不使我的布尔表达式太长的情况下包括所有列的简便方法是什么？

Answer 1

尝试结合使用isin和any，并用~否定条件：

pd_sequences[~pd_sequences[['0', '1', '2', '3', '4', '5']].isin([100, 101]).any(1)]

Answer 2

您可以定义一个实现删除条件的函数，然后应用此函数选择行：

# This column represents rows satisfying the condition
bool_column = df.apply(lambda x: True if x[0] == 100 or x[1] == 101 else False, axis=1)
filtered_df = df[col.values]  # Select rows with True condition
filtered_df = df[~col.values]  # Select rows with False condition

此处，此函数实现为lambda，但在更复杂的计算的情况下，它可以是普通的Python函数。如果条件中涉及的列过多，则也可以通过循环df.columns来自动执行此操作。另外，如有必要，您可以在df.apply中将其他参数传递给该函数。

在多列条件下删除行

2 个答案: