通过检查多列上的条件来创建新变量

时间:2019-09-06 10:14:40

标签: python pandas

我是Python的新手,刚开始学习pandas。我想通过检查多列上的条件来创建一个新变量。

import pandas as pd
import datetime
import numpy as np

假设我有以下数据框

d = {'CUSTNO':[123, 124, 125, 126], 'STATUS':['ACTIVE', 'NO', 'CANCEL', 'ACTIVE'], 'CANCEL':[np.nan, '2019-08-09', np.nan, '2019-09-17']}
df = pd.DataFrame(d)
df['CANCEL'] = df['CANCEL'].apply(lambda x: pd.to_datetime(x, format = '%Y-%m-%d', errors = 'coerce'))


 CUSTNO  STATUS      CANCEL
0   123  ACTIVE         NaT
1   124      NO  2019-08-09
2   125  CANCEL         NaT
3   126  ACTIVE  2019-09-17

我要适用的条件如下:

如果df['STATUS']的值为'NO'或'CANCEL'或df['CANCEL']包含日期值:HOLDING将设置为'N',否则为'Y'。

预期收益如下:

 CUSTNO  STATUS     CANCLE  HOLDING
0   123  ACTIVE        NaT        Y
1   124      NO 2019-08-09        N
2   125  CANCLE        NaT        N
3   126  ACTIVE 2019-09-17        N

能否请您提出建议?

2 个答案:

答案 0 :(得分:2)

使用:

c=df.STATUS.isin(['NO','CANCEL'])|df.CANCEL.notna()
df['HOLDING']=np.where(c,'N','Y')

   CUSTNO  STATUS     CANCEL HOLDING
0     123  ACTIVE        NaT       Y
1     124      NO 2019-08-09       N
2     125  CANCEL        NaT       N
3     126  ACTIVE 2019-09-17       N

详细信息:

#df.STATUS.isin(['NO','CANCEL']) #checks if STATUS is NO or CANCEL
#df.CANCEL.notna() #checks if CANCEL is not null and has a date
c=df.STATUS.isin(['NO','CANCEL'])|df.CANCEL.notna()

0    False
1     True
2     True
3     True
dtype: bool

然后我们使用np.where分配N,其中c为True,否则为Y

答案 1 :(得分:0)

尝试:

>>> df["HOLDING"]=df.apply(lambda x: pd.Series({"HOLDING": "N" if x.STATUS=="NO" or isinstance(x.CANCEL, np.datetime64) else "Y"}), axis=1)
>>> df
      CANCEL  CUSTNO  STATUS HOLDING
0        NaT     123  ACTIVE       Y
1 2019-08-09     124      NO       N
2        NaT     125  CANCEL       Y
3 2019-09-17     126  ACTIVE       Y