我是Python的新手,刚开始学习pandas
。我想通过检查多列上的条件来创建一个新变量。
import pandas as pd
import datetime
import numpy as np
假设我有以下数据框
d = {'CUSTNO':[123, 124, 125, 126], 'STATUS':['ACTIVE', 'NO', 'CANCEL', 'ACTIVE'], 'CANCEL':[np.nan, '2019-08-09', np.nan, '2019-09-17']}
df = pd.DataFrame(d)
df['CANCEL'] = df['CANCEL'].apply(lambda x: pd.to_datetime(x, format = '%Y-%m-%d', errors = 'coerce'))
CUSTNO STATUS CANCEL
0 123 ACTIVE NaT
1 124 NO 2019-08-09
2 125 CANCEL NaT
3 126 ACTIVE 2019-09-17
我要适用的条件如下:
如果df['STATUS']
的值为'NO'或'CANCEL'或df['CANCEL']
包含日期值:HOLDING
将设置为'N',否则为'Y'。
预期收益如下:
CUSTNO STATUS CANCLE HOLDING
0 123 ACTIVE NaT Y
1 124 NO 2019-08-09 N
2 125 CANCLE NaT N
3 126 ACTIVE 2019-09-17 N
能否请您提出建议?
答案 0 :(得分:2)
使用:
c=df.STATUS.isin(['NO','CANCEL'])|df.CANCEL.notna()
df['HOLDING']=np.where(c,'N','Y')
CUSTNO STATUS CANCEL HOLDING
0 123 ACTIVE NaT Y
1 124 NO 2019-08-09 N
2 125 CANCEL NaT N
3 126 ACTIVE 2019-09-17 N
详细信息:
#df.STATUS.isin(['NO','CANCEL']) #checks if STATUS is NO or CANCEL
#df.CANCEL.notna() #checks if CANCEL is not null and has a date
c=df.STATUS.isin(['NO','CANCEL'])|df.CANCEL.notna()
0 False
1 True
2 True
3 True
dtype: bool
然后我们使用np.where
分配N
,其中c为True,否则为Y
答案 1 :(得分:0)
尝试:
>>> df["HOLDING"]=df.apply(lambda x: pd.Series({"HOLDING": "N" if x.STATUS=="NO" or isinstance(x.CANCEL, np.datetime64) else "Y"}), axis=1)
>>> df
CANCEL CUSTNO STATUS HOLDING
0 NaT 123 ACTIVE Y
1 2019-08-09 124 NO N
2 NaT 125 CANCEL Y
3 2019-09-17 126 ACTIVE Y