使用数据框,我试图基于 RISK_RATING 值创建一个名为 CLASS 的新字符串变量。如果 RISK_RATING 值包含“PEP”,则“PEP”将是 CLASS 值。否则,“SF”将是 CLASS 名称。如果没有RISK_RATING值,则CLASS名称为'missing'
Here is a sample of my dataframe:
BUSINESS CUSTOMER_ID RISK_RATING
0 PVB 1000033280 HR
1 PVB 1000166304 PEP (SR)
2 PVB 1004006928 PEP (SR)
3 PVB 1004006936 PEP (SR)
答案 0 :(得分:0)
我已尽可能简化,请尝试以下操作:
import numpy as np
df['CLASS'] = np.where(df['RISK_RATING'].str.contains('PEP'),'PEP','SF')
df['CLASS'] = np.where(df['RISK_RATING'].isnull(),'missing',df['CLASS'])
会得到你:
BUSINESS CUSTOMER_ID RISK_RATING CLASS
0 PVB 1000033280 HR SF
1 PVB 1000166304 PEP (SR) PEP
2 PVB 1004006928 PEP (SR) PEP
3 PVB 1004006936 PEP (SR) PEP
答案 1 :(得分:0)
您需要启动列CLASS
df['CLASS'] = "SF"
使用 .loc
为列分配一个新值
df.loc[df['RISK_RATING'].str.contains("PEP"), 'CLASS'] = 'PEP'