Question

我有一个如下形式的简单数据集：

import pandas as pd

df = pd.DataFrame(
         [
             ["Norway"     , 7.537,  0.039, 11  , 31],
             ["Denmark"    , 7.522, -0.004,  9  , 12],
             ["Switzerland", 7.494,  None , 15  , 50],
             ["Finland"    , 7.469,  None , None, 29],
             ["Netherlands", 7.377,  1    , None, 77],
         ],
         columns = [
             "country",
             "score A",
             "score B",
             "score C",
             "score D"
         ]
    )

如何过滤此数据集，以便将某些条件放在多行的值上？那么，假设我想过滤数据，以便排除score B 和 score C的空值的所有行（所有国家/地区）？这将导致排除Finland行。

当我尝试以下操作时，我会在排除的任一列中获取包含任何空值的所有行，从而只包含Norway和Denmark行：

df[(df["score B"].notnull()) & (df["score C"].notnull())]

如何做到这一点？

Answer 1

如何指定or：

df[(df["score B"].notnull()) | (df["score C"].notnull())]

输出：

       country  score A  score B  score C  score D
0       Norway    7.537    0.039     11.0       31
1      Denmark    7.522   -0.004      9.0       12
2  Switzerland    7.494      NaN     15.0       50
4  Netherlands    7.377    1.000      NaN       77

右？您想要的只是排除两者为空（或者我没有正确理解这一点）的情况？

Answer 2

你需要

df[~(df['score B'].isnull() & df['score C'].isnull())]

    country     score A score B score C score D
0   Norway      7.537   0.039   11.0    31
1   Denmark     7.522   -0.004  9.0     12
2   Switzerland 7.494   NaN     15.0    50
4   Netherlands 7.377   1.000   NaN     77

在pandas中，如何根据为多个指定列的值指定的条件过滤DataFrame？

2 个答案: