我有一个熊猫数据框,其中有3列['a','b','c']。我想根据几个条件在整个数据框上应用一个函数,并对其进行标记,以便在数据框中获得4个新列。我有下面的代码,但是它不起作用,我得到的错误是:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
,代码为:
if df['a'] is pd.NaT:
df['is_open'] = df['c']
elif df['b']=='04' or df['b']=='14':
df['is_wo'] = df['c']
elif (df['b']!='05') and (df['a'] is not pd.NaT):
df['is_payment'] = df['c']
else:
df['is_correction'] = df['c']
您知道我如何应用这些条件吗?注意,条件的顺序很重要。
我想出了这个解决方案,但是在大型数据框上速度很慢:
def get_open_debt_outcome(row):
if row['a'] is pd.NaT:
return row['c']
else:
return np.nan
def get_wo_outcome(row):
if pd.isna(row['is_open']) and (row['b']=='04' or row['b']=='14'):
return row['c']
else:
return np.nan
def get_payment_outcome(row):
if pd.isna(row['is_open']) and pd.isna(row['is_wo']) and (row['b']!='05') and (row['a'] is not pd.NaT):
return row['c']
else:
return np.nan
def get_correction_outcome(row):
if pd.isna(row['is_open']) and pd.isna(row['is_wo']) and pd.isna(row['is_payment']):
return row['c']
else:
return np.nan
df['is_open'] = df.apply(lambda x: get_open_debt_outcome(x), axis=1)
df['is_wo'] = df.apply(lambda x: get_wo_outcome(x), axis=1)
df['is_payment'] = df.apply(lambda x: get_payment_outcome(x), axis=1)
df['is_correction'] = df.apply(lambda x: get_correction_outcome(x), axis=1)
解决方案: 根据@blacksite的回复
mask = df['a'].isnull()
df['is_open'] = np.where(mask, df['c'], np.nan)
mask = (
df['is_open'].isnull() &
((df['b'] == '04') | (df['b'] == '14'))
)
df['is_wo'] = np.where(mask, df['c'], np.nan)
mask = (
df['is_open'].isnull() &
df['is_wo'].isnull() &
(df['b'] != '05') &
df['a'].notnull()
)
df['is_payment'] = np.where(mask, df['c'], np.nan)
mask = (
df['is_open'].isnull() &
df['is_wo'].isnull() &
df['is_payment'].isnull()
)
df['is_correction'] = np.where(mask, df['c'], np.nan)
答案 0 :(得分:1)
这是如何获取'is_wo'
列的示例。其余的非常相似:
import numpy as np
# True-False indexing. Vectorized, so much faster than element-wise.
mask = (
df['is_open'].isnull() &
((df['b'] == '04') | (df['b'] == '14'))
)
# numpy.where is basically an ifelse statement, taking a boolean vector as the first argument, and the desired values for true and false as the second and third arguments
df['is_wo'] = np.where(mask, df['c'], np.nan)
pandas.DataFrame.apply
通常很慢。