我有以下df
,
days days_1 days_2 period percent_1 percent_2 amount
3 5 4 1 0.2 0.1 100
2 1 3 4 0.3 0.1 500
9 8 10 6 0.4 0.2 600
10 7 8 11 0.5 0.3 700
10 5 6 7 0.7 0.4 800
我有以下逻辑适用于df
,
for each row in df:
if days < days_1:
amount_missed = 0
days_missed = 0
elif days_1 < days < days_2:
missed_percent = percent_1 - percent_2
amount_missed = amount * (missed_percent / 100)
days_missed = days - days_1
elif days_2 < days < period or days > period:
missed_percent = percent_2
amount_missed = amount * (missed_percent / 100)
days_missed = days - days_2
else:
amount_missed = 0
days_missed = 0
我正在尝试使用布尔掩码和np.where
来翻译上述逻辑,如下所示,
cond1 = df['days_2'] < df['days']
cond2 = df['days'] < df['period']
cond3 = df['days'] > df['period']
cond4 = df['days'] >= df['days_1']
cond5 = df['days'] < df['days_2']
cond6 = df['days'] > df['days_1']
mask = ((cond1 & cond2) | cond3) & cond4
mask2 = cond5 & cond6
df['amount_missed'] = np.where(mask, df['amount'] * df['percent_2'] / 100, 0.0)
df['amount_missed'] = np.where(mask2, df['amount'] * (df['percent_1'] - df['percent_2']) / 100, 0.0)
df['days_missed'] = np.where(mask, df['days'] - df['days_2'], 0)
df['days_missed'] = np.where(mask2, df['days'] -df['days_1'], 0)
但上面代码的结果与行迭代结果不一样,应该是
{
'amount_missed': {0: 0.0, 1: 1.0, 2: 1.2, 3: 2.1, 4: 3.2},
'days_missed': {0: 0, 1: 1, 2: 1, 3: 2, 4: 4}
}
布尔掩码生成以下结果,
{
'amount_missed': {0: 0.0, 1: 0.9999999999999999, 2: 1.2, 3: 0.0, 4: 0.0},
'days_missed': {0: 0, 1: 1, 2: 1, 3: 0, 4: 0}
}
我想知道如何修复它,也许还有其他方法可以在这里替换df
行迭代。
答案 0 :(得分:2)
用于生成原始数据框的代码(来自原始未编辑的问题):
bin\Debug\
以下代码提供了您原始问题中所需的结果(未针对您在评论中要求这样做后创建的简化案例进行更新):
df = pd.DataFrame({
'days': [3, 2, 9, 10, 10],
'days_1': [5, 1, 8, 7, 5],
'days_2': [4, 3, 10, 8, 6],
'period': [1, 4, 6, 11, 7],
'percent_1': [0.2, 0.3, 0.4, 0.5, 0.7],
'percent_2': [0.1, 0.1, 0.2, 0.3, 0.4],
'amount': [100, 500, 600, 700, 800]
}, columns=['days', 'days_1', 'days_2', 'period', 'percent_1', 'percent_2', 'amount'])
输出:
df['amount_missed'] = np.where((df['days_1'] < df['days']) & (df['days'] < df['days_2']),
df['amount'] * (df['percent_1'] - df['percent_2']) / 100,
np.where((df['days_2'] < df['days']) & (df['days'] < df['period']),
df['amount'] * (df['percent_2']) / 100,
0.0))
df['days_missed'] = np.where((df['days_1'] < df['days']) & (df['days'] < df['days_2']),
df['days'] - df['days_1'],
np.where((df['days_2'] < df['days']) & (df['days'] < df['period']),
df['days'] - df['days_2'],
0))
编辑:
与numpy.select
相同的答案:
days days_1 days_2 period percent_1 percent_2 amount amount_missed \
0 3 5 4 1 0.2 0.1 100 0.0
1 2 1 3 4 0.3 0.1 500 1.0
2 9 8 10 6 0.4 0.2 600 1.2
3 10 7 8 11 0.5 0.3 700 2.1
4 10 5 6 7 0.7 0.4 800 0.0
days_missed
0 0
1 1
2 1
3 2
4 0
答案 1 :(得分:2)
错误的根本原因是每次使用新的np.where()覆盖目标变量,而不是级联where()表达式。但是比级联where()表达式np.select()
更好:
c0 = df.days < df.days_1
c1 = (df.days_1 < df.days) & (df.days < df.days_2)
c2 = ((df.days_2 < df.days) & (df.days < df.period)) | (df.days > df.period)
df['days_missed'] = np.select([c0, c1, c2], [0, df.days - df.days_1, df.days - df.days_2])