Question

我有一个相当大的数据框（几百列），我想对其执行以下操作。我在下面使用一个带有简单条件的玩具数据框来说明我的需求。

对于每一行：条件1：检查两列的值为零（0）。如果是这样，请保留该行然后继续下一个如果任一列的值为零（0），则条件为True。

如果条件＃1为False（第1列或第4列中没有零）检查行中所有剩余的列。如果其余任何列的值为零，则删除该行。

我希望将过滤后的数据帧作为新的单独数据帧返回。

到目前为止，我的代码：

# https://codereview.stackexchange.com/questions/185389/dropping-rows-from-a-pandas-dataframe-where-some-of-the-columns-have-value-0/185390
# https://thispointer.com/python-pandas-how-to-drop-rows-in-dataframe-by-conditions-on-column-values/
# https://stackoverflow.com/questions/29763620/how-to-select-all-columns-except-one-column-in-pandas

import pandas as pd

df = pd.DataFrame({'Col1': [7, 6, 0, 1, 8],
                   'Col2': [0.5, 0.5, 0, 0, 7],
                   'Col3': [0, 0, 3, 3, 6],
                   'Col4': [7, 0, 6, 4, 5]})

print(df)
print()

exclude = ['Col1', 'Col4']
all_but_1_and_4 = df[df.columns.difference(exclude)]        # Filter out columns 1 and 4
print(all_but_1_and_4)
print()


def delete_rows(row):
    if row['Col1'] == 0 or row['Col4'] == 0:    # Is the value in either Col1 or Col4 zero(0)
        skip = True                             # If it is, keep the row
        if not skip:                            # If not, check the second condition
            is_zero = all_but_1_and_4.apply(lambda x: 0 in x.values, axis=1).any()      # Are any values in the remaining columns zero(0)
            if is_zero:                         # If any of the remaining columns has a value of zero(0)
                pass
                # drop the row being analyzed   # Drop the row.


new_df = df.apply(delete_rows, axis=1)
print(new_df)

如果同时满足我的两个条件，我不知道如何实际删除该行。

在我的玩具数据框中，应保留第1、2和4行，并删除0和3行。

我不想手动检查步骤2的所有列，因为有几百个。这就是为什么我使用.difference（）进行过滤的原因。

Answer 1

我会做什么

s1=df[exclude].eq(0).any(1)
s2=df[df.columns.difference(exclude)].eq(0).any(1)

~(~s1&s2) #s1 | ~s2
Out[97]: 
0    False
1     True
2     True
3    False
4     True
dtype: bool
yourdf=df[s1 | ~s2].copy()

Answer 2

WeNYoBen的回答非常好，因此我只会在您的代码中显示错误：

以下if语句中的条件将永远无法满足：

    skip = True                             # If it is, keep the row
    if not skip:                            # If not, check the second condition

您可能想取消缩进以下几行，即

    skip = True                             # If it is, keep the row
if not skip:                            # If not, check the second condition

与简单的else:相同，而无需skip = True：

else:                            # If not, check the second condition

如果您的 whole 表中至少有一个值为零（因此，不仅在当前行中，而且在您的情况下，则以下if语句中的条件将始终满足应该）：
```
    is_zero = all_but_1_and_4.apply(lambda x: 0 in x.values, axis=1).any()      # Are any values in the remaining columns zero(0)
    if is_zero:                         # If any of the remaining columns has a value of zero(0)
```
因为all_but_1_and_4.apply(lambda x: 0 in x.values, axis=1)是True / False值的系列-all_but_1_and_4表中的每一行都有一个。因此，在将.any()方法应用于该方法之后，您会收到我说的话。

注意：

您的方法还不错，您可以在函数中添加变量dropThisRow，根据条件将其设置为True或False，然后将其返回。
然后，您可以使用函数创建True / False系列并将其用于创建目标表：

dropRows = df.apply(delete_rows, axis=1)   # True/False for dropping/keeping - for every row
new_df = df[~dropRows]                     # Select only rows with False

根据两个相关条件删除数据框行

2 个答案: