对熊猫数据框应用几个条件

时间:2020-04-09 12:30:29

标签: pandas

我有一个熊猫数据框,其中有3列['a','b','c']。我想根据几个条件在整个数据框上应用一个函数,并对其进行标记,以便在数据框中获得4个新列。我有下面的代码,但是它不起作用,我得到的错误是:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

,代码为:

if df['a'] is pd.NaT:
    df['is_open'] = df['c']
elif df['b']=='04' or df['b']=='14':
    df['is_wo'] = df['c']
elif (df['b']!='05') and (df['a'] is not pd.NaT):
    df['is_payment'] = df['c']
else:
    df['is_correction'] =  df['c']

您知道我如何应用这些条件吗?注意,条件的顺序很重要。

我想出了这个解决方案,但是在大型数据框上速度很慢:

def get_open_debt_outcome(row):
    if row['a'] is pd.NaT:
        return row['c']
    else:
        return np.nan

def get_wo_outcome(row):
    if pd.isna(row['is_open'])  and (row['b']=='04' or row['b']=='14'):
        return row['c']
    else:
        return np.nan

def get_payment_outcome(row):
    if pd.isna(row['is_open']) and pd.isna(row['is_wo']) and (row['b']!='05') and (row['a'] is not pd.NaT):
        return row['c']
    else:
        return np.nan

def get_correction_outcome(row):
    if pd.isna(row['is_open']) and pd.isna(row['is_wo']) and pd.isna(row['is_payment']):
        return row['c']
    else:
        return np.nan


df['is_open'] = df.apply(lambda x: get_open_debt_outcome(x), axis=1)
df['is_wo'] = df.apply(lambda x: get_wo_outcome(x), axis=1)
df['is_payment'] = df.apply(lambda x: get_payment_outcome(x), axis=1)
df['is_correction'] = df.apply(lambda x: get_correction_outcome(x), axis=1)

解决方案: 根据@blacksite的回复

mask = df['a'].isnull()
df['is_open'] = np.where(mask, df['c'], np.nan)

mask = (
    df['is_open'].isnull() &
    ((df['b'] == '04') | (df['b'] == '14'))
)
df['is_wo'] = np.where(mask, df['c'], np.nan)

mask = (
    df['is_open'].isnull() &
    df['is_wo'].isnull() &
    (df['b'] != '05') &
    df['a'].notnull()
)

df['is_payment'] = np.where(mask, df['c'], np.nan)

mask = (
        df['is_open'].isnull() &
        df['is_wo'].isnull() &
        df['is_payment'].isnull() 
    )

df['is_correction'] = np.where(mask, df['c'], np.nan)

1 个答案:

答案 0 :(得分:1)

这是如何获取'is_wo'列的示例。其余的非常相似:

import numpy as np

# True-False indexing. Vectorized, so much faster than element-wise.
mask = (
    df['is_open'].isnull() &
    ((df['b'] == '04') | (df['b'] == '14'))
)
# numpy.where is basically an ifelse statement, taking a boolean vector as the first argument, and the desired values for true and false as the second and third arguments
df['is_wo'] = np.where(mask, df['c'], np.nan)

pandas.DataFrame.apply通常很慢。