使用np.where创建具有三个条件的新列

时间:2019-10-09 16:59:35

标签: python pandas numpy dataframe

如何在np.where()中获得三个条件。通常它只使用两个条件,我怎么才能得到三个。 就像我需要创建一个新列Better_Event来存储“夏季”,“冬季”或“两者”,基于使用“ np”在夏季和冬季比赛中获得的总奖牌数之间的比较(即,Total_Summer和Total_Winter列之间的比较) .where()”功能。

data['Better_Events'] = np.where(data['Total_Summer']>data['Total_Winter'],'Summer','Winter')

上面的代码只有两个输出。我如何将其更改为三个,如果 data ['Total_Summer'] == data ['Total_Winter']给出“两者”

3 个答案:

答案 0 :(得分:4)

您需要np.select

以下是示例:

df=pd.DataFrame({'Total_Summer':[1,2,3,3,6,7],'Total_Winter':[2,2,3,4,5,4]})
print(df)

   Total_Summer  Total_Winter
0             1             2
1             2             2
2             3             3
3             3             4
4             6             5
5             7             4

现在设置条件和每个条件的值:

cond=[df['Total_Summer']>df['Total_Winter'],df['Total_Summer']<df['Total_Winter'],df['Total_Summer'].eq(df['Total_Winter'])]
values=['Summer','Winter','Both']
df['Better_Events']=np.select(cond,values)
print(df)

   Total_Summer  Total_Winter Better_Events
0             1             2        Winter
1             2             2          Both
2             3             3          Both
3             3             4        Winter
4             6             5        Summer
5             7             4        Summer

答案 1 :(得分:0)

您可以将'apply'与axis = 1一起使用:

Total_Summer  Total_Winter
0            74            17
1            75            29
2            48            64
3            77            77
4            16            38

df.apply(lambda r: "Both" if r.Total_Summer==r.Total_Winter else "Summer" if r.Total_Summer>r.Total_Winter else "Winter" ,axis=1) 

Out: 
0    Summer
1    Summer
2    Winter
3      Both
4    Winter
dtype: object

或者您可以两次使用np.where:

np.where( df.Total_Summer.eq(df.Total_Winter),"Both", np.where(df.Total_Summer.gt(df.Total_Winter),"Summer","Winter")) 

第二种方法更快。

答案 2 :(得分:0)

Numpy.select效果很好,但是我确实想提出一个替代解决方案,当条件更多或更复杂时,它应该更好:

# numpy is only used to create the test data
import numpy as np
import pandas as pd

total_summer, total_winter = np.split(np.random.randint(low=0, high=15, size=20), 2)

df = pd.DataFrame(data=zip(total_summer, total_winter), columns=["total_summer", "total_winter"])

def find_better_event(row):
    res : str
    if row["total_summer"] > row["total_winter"]:
        res = "Summer"
    elif row["total_summer"] < row["total_winter"]:
        res = "Winter"
    else:
        res = "Both"
    return res

df["better_events"] = df.apply(find_better_event, axis=1)