基于另外两个和条件创建一个新列

时间:2021-01-14 09:22:48

标签: python arrays pandas

我有一个两列数据框的形式:

    Death       HEALTH
0   other       0.0
1   other       1.0
2   vascular    0.0
3   other       0.0
4   other       0.0
5   vascular    0.0
6   NaN         0.0
7   NaN         0.0
8   NaN         0.0
9   vascular    1.0

我想按照以下步骤创建一个新列:

  1. 在出现值“other”的地方,写一个“No”
  2. 在出现 NaN 的地方,保持原样
  3. 无论第一列中出现值“血管”,第二列中出现 1.0,都写“是”
  4. 无论在第一列中出现值“血管”和第二列中出现 0.0 的任何地方,都写上“否”

输出应该是:

    Death       HEAlTH       New
0   other       0.0          No
1   other       1.0          No
2   vascular    0.0          No
3   other       0.0          No
4   other       0.0          No
5   vascular    0.0          No
6   NaN         0.0          NaN
7   NaN         0.0          NaN
8   NaN         0.0          NaN
9   vascular    1.0          Yes

有没有pythonic的方法来实现这一点?我完全迷失在循环和条件之间。

2 个答案:

答案 0 :(得分:0)

您可以为 NoYes 创建条件,并为所有其他值在 numpy.select 中创建原始值:

m1 = df['Death'].eq('other') | (df['Death'].eq('vascular') & df['HEALTH'].eq(0))
m2 = (df['Death'].eq('vascular') & df['HEALTH'].eq(1))

df['new'] = np.select([m1, m2], ['No','Yes'], default=df['Death'])

另一个想法是测试缺失值,如果没有匹配条件设置原始值:

m1 = df['Death'].eq('other') | (df['Death'].eq('vascular') & df['HEALTH'].eq(0))
m2 = (df['Death'].eq('vascular') & df['HEALTH'].eq(1))
m3 = df['Death'].isna() 

df['new'] = np.select([m1, m2, m3], ['No','Yes', np.nan], default=df['Death'])

print (df)

print (df)
0  another val     0.0  another val
1        other     1.0           No
2     vascular     0.0           No
3        other     0.0           No
4        other     0.0           No
5     vascular     0.0           No
6          NaN     0.0          NaN
7          NaN     0.0          NaN
8          NaN     0.0          NaN
9     vascular     1.0          Yes

答案 1 :(得分:0)

一种简单的方法是在函数内部使用 if/else 来实现您的条件逻辑,然后 apply 将此函数逐行传递到数据帧。

def function(row):
    if row['Death']=='other':
        return 'No'
    if row['Death']=='vascular':
        if row['Health']==1:
            return 'Yes'
        elif row['Health']==0:
            return 'No'
    return np.nan
# axis = 1 to apply it row-wise
df['New'] = df.apply(function, axis=1)

它根据需要产生以下输出:

      Death  Health  New
0     other       0   No
1     other       1   No
2  vascular       0   No
3     other       0   No
4     other       0   No
5  vascular       0   No
6       NaN       0  NaN
7       NaN       0  NaN
8       NaN       0  NaN
9  vascular       1  Yes