熊猫条件过滤器

时间:2018-11-23 06:43:23

标签: python pandas

我有一个数据框

   A     B     C
0  True  True  True
1  True  False False
2  False False False

我要添加具有以下条件的行D:

如果A,B和C为真,则D为真。否则,D是错误的。

我尝试过

df['D'] = df.loc[(df['A'] == True) & df['B'] == True & df['C'] == True] 

我明白了

TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]

然后,我尝试遵循this example并编写了与链接中建议的功能类似的功能:

def all_true(row):

   if row['A'] == True:
      if row['B'] == True:
         if row['C'] == True:
             val = True
   else:
      val = 0

return val

df['D'] = df.apply(all_true(df), axis=1)

在这种情况下我会得到

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我很乐意提出建议。谢谢!

3 个答案:

答案 0 :(得分:4)

甚至更好:

df['D']=df.all(1)

现在:

print(df)

是:

       A      B      C      D
0   True   True   True   True
1   True  False  False  False
2  False  False  False  False

答案 1 :(得分:3)

没有必要与True进行比较,带有&的Ony链布尔掩码:

df['D'] = df['A'] & df['B'] & df['C']

如果性能很重要:

df['D'] = df['A'].values & df['B'].values & df['C'].values

或使用DataFrame.all来检查每行所有True

df['D'] = df[['A','B','C']].all(axis=1)

#numpy all 
#df['D'] = np.all(df.values,1)

print (df)
       A      B      C      D
0   True   True   True   True
1   True  False  False  False
2  False  False  False  False

性能

g

np.random.seed(125)

def all1(df):
    df['D'] = df.all(axis=1)
    return df

def all1_numpy(df):
    df['D'] = np.all(df.values,1)
    return df

def eval1(df):
    df['D'] = df.eval('A & B & C')
    return df

def chained(df):
    df['D'] = df['A'] & df['B'] & df['C']
    return df

def chained_numpy(df):
    df['D'] = df['A'].values & df['B'].values & df['C'].values
    return df

def make_df(n):
    df = pd.DataFrame({'A':np.random.choice([True, False], size=n),
                       'B':np.random.choice([True, False], size=n),
                       'C':np.random.choice([True, False], size=n)})
    return df

perfplot.show(
    setup=make_df,
    kernels=[all1, all1_numpy, eval1,chained,chained_numpy],
    n_range=[2**k for k in range(2, 25)],
    logx=True,
    logy=True,
    equality_check=False,
    xlabel='len(df)')

答案 2 :(得分:1)

使用熊猫eval

df['D'] = df.eval('A & B & C')

或者:

df = df.eval('D = A & B & C')
#alternative inplace df.eval('D = A & B & C', inplace=True)

或者:

df['D'] = np.all(df.values,1)

print(df)
       A      B      C      D
0   True   True   True   True
1   True  False  False  False
2  False  False  False  False