在pandas中,如何根据为多个指定列的值指定的条件过滤DataFrame?

时间:2017-05-19 15:45:27

标签: pandas dataframe

我有一个如下形式的简单数据集:

import pandas as pd

df = pd.DataFrame(
         [
             ["Norway"     , 7.537,  0.039, 11  , 31],
             ["Denmark"    , 7.522, -0.004,  9  , 12],
             ["Switzerland", 7.494,  None , 15  , 50],
             ["Finland"    , 7.469,  None , None, 29],
             ["Netherlands", 7.377,  1    , None, 77],
         ],
         columns = [
             "country",
             "score A",
             "score B",
             "score C",
             "score D"
         ]
    )

如何过滤此数据集,以便将某些条件放在多行的值上?那么,假设我想过滤数据,以便排除score B score C的空值的所有行(所有国家/地区)?这将导致排除Finland行。

当我尝试以下操作时,我会在排除的任一列中获取包含任何空值的所有行,从而只包含NorwayDenmark行:

df[(df["score B"].notnull()) & (df["score C"].notnull())]

如何做到这一点?

2 个答案:

答案 0 :(得分:1)

如何指定or

df[(df["score B"].notnull()) | (df["score C"].notnull())]

输出:

       country  score A  score B  score C  score D
0       Norway    7.537    0.039     11.0       31
1      Denmark    7.522   -0.004      9.0       12
2  Switzerland    7.494      NaN     15.0       50
4  Netherlands    7.377    1.000      NaN       77

右?您想要的只是排除两者为空(或者我没有正确理解这一点)的情况?

答案 1 :(得分:1)

你需要

df[~(df['score B'].isnull() & df['score C'].isnull())]

    country     score A score B score C score D
0   Norway      7.537   0.039   11.0    31
1   Denmark     7.522   -0.004  9.0     12
2   Switzerland 7.494   NaN     15.0    50
4   Netherlands 7.377   1.000   NaN     77