我有一个类似的DataFrame:
+------------+---------------+-------------+---------------------+-------------------+
| SK_ID_CURR | CREDIT_ACTIVE | DAYS_CREDIT | DAYS_CREDIT_ENDDATE | DAYS_ENDDATE_FACT |
+------------+---------------+-------------+---------------------+-------------------+
| 436084 | Sold | -2835 | -2094.0 | -2436.0 |
| 436084 | Active | -987 | -438.0 | NaN |
| 436084 | Sold | -1875 | -1494.0 | -1494.0 |
| 436084 | Active | -1135 | -951.0 | NaN |
| 436084 | Bad debt | -986 | NaN | NaN |
| 436084 | Active | -968 | -845.0 | NaN |
| 436084 | Active | -987 | -803.0 | NaN |
+------------+---------------+-------------+---------------------+-------------------+
我喜欢使用以下规则创建新列CREDIT_LENGTH_IN_DAYS:
def func(x):
if x[x['CREDIT_ACTIVE'] == 'Active']:
return x['DAYS_CREDIT_ENDDATE'] - x['DAYS_CREDIT']
elif x[x['CREDIT_ACTIVE'] == 'Closed'] | x[x['CREDIT_ACTIVE'] == 'Sold'] :
return x['DAYS_ENDDATE_FACT'] - x['DAYS_CREDIT']
elif x[x['CREDIT_ACTIVE'] == 'Bad debt']:
return x['DAYS_CREDIT']
然后我用:
df_bureau['CREDIT_LENGTH_IN_DAYS'] = df_bureau.apply(func, axis=1)
无论如何,x[x['CREDIT_ACTIVE']=='Bad debt'
都是有趣的值,而不是x['DAYS_CREDIT']
中每一行的实际值。
答案 0 :(得分:2)
使用numpy.select
:
m1 = df_bureau['CREDIT_ACTIVE'] == 'Active'
m2 = df_bureau['CREDIT_ACTIVE'].isin(['Closed','Sold'])
m3 = df_bureau['CREDIT_ACTIVE'] == 'Bad debt'
v1 = df_bureau['DAYS_CREDIT_ENDDATE'] - df_bureau['DAYS_CREDIT']
v2 = df_bureau['DAYS_ENDDATE_FACT'] - df_bureau['DAYS_CREDIT']
v3 = df_bureau['DAYS_CREDIT']
df_bureau['CREDIT_LENGTH_IN_DAYS'] = np.select([m1, m2, m3], [v1, v2, v3], np.nan)
print (df_bureau)
SK_ID_CURR CREDIT_ACTIVE DAYS_CREDIT DAYS_CREDIT_ENDDATE \
0 436084 Sold -2835 -2094.0
1 436084 Active -987 -438.0
2 436084 Sold -1875 -1494.0
3 436084 Active -1135 -951.0
4 436084 Bad debt -986 NaN
5 436084 Active -968 -845.0
6 436084 Active -987 -803.0
DAYS_ENDDATE_FACT CREDIT_LENGTH_IN_DAYS
0 -2436.0 399.0
1 NaN 549.0
2 -1494.0 381.0
3 NaN 184.0
4 NaN -986.0
5 NaN 123.0
6 NaN 184.0
您的解决方案分别与每一行一起使用,因此不需要过滤,还需要将|
更改为or
,因为使用标量:
def func(x):
if x['CREDIT_ACTIVE'] == 'Active':
return x['DAYS_CREDIT_ENDDATE'] - x['DAYS_CREDIT']
elif (x['CREDIT_ACTIVE'] == 'Closed') or (x['CREDIT_ACTIVE'] == 'Sold'):
return x['DAYS_ENDDATE_FACT'] - x['DAYS_CREDIT']
elif x['CREDIT_ACTIVE'] == 'Bad debt':
return x['DAYS_CREDIT']
df_bureau['CREDIT_LENGTH_IN_DAYS'] = df_bureau.apply(func, axis=1)