我有这样的df:
Name Salary Age Cars Avg Salary Avg Age Avg Cars
John 50000 35 1 60000 38 1
Tom 65000 45 3 60000 38 1
当值较高时,某些列更好,而其他列则相反。所以我创建了两个列表
higher_better = ['Salary', 'Cars']
lower_better = ['Age']
我想比较它们并为它们返回一个分数,所以我定义了一个像这样的新函数:
def compare_higher(a, b):
return 1 if a > b else 0 if a == b else -1
compare_higher(higher_better, lower_better)
def compare_lower(b, a):
return 1 if a > b else 0 if a == b else -1
compare_lower(higher_better, lower_better)
我想为比较结果创建新列,然后我可以将它们的分数加在一起。理想输出如下:
Name Salary Age Cars Avg Salary Avg Age Avg Cars Comp Salary Comp Age Comp Cars Score
John 50000 35 1 60000 38 1 -1 1 0 0
Tom 65000 45 3 60000 38 1 1 -1 1 1
我有件但不知道如何组合它们。如何比较列中的值并将它们返回到新列中?谢谢你的帮助。
答案 0 :(得分:1)
我认为您可以使用多个apply
:
首先,您可以申请higher_better
:
for col in higher_better:
df['Comp ' + col] = df.apply(lambda row: compare_higher(row[col], row['Avg ' + col]), axis=1)
然后,到lower_better
:
for col in lower_better:
df['Comp ' + col] = df.apply(lambda row: compare_lower(row[col], row['Avg ' + col]), axis=1)
最后,将两者的列合并到score
:
comp_col = ['Comp '+ col for col in higher_better+lower_better]
df['score'] = df[comp_col].sum(axis=1)
结果:
Name Salary Age Cars Avg Salary Avg Age Avg Cars Comp Salary \
0 John 50000 35 1 60000 38 1 -1
1 Tom 65000 45 3 60000 38 1 1
Comp Cars Comp Age score
0 0 1 0
1 1 -1 1
答案 1 :(得分:1)
您也可以避免使用np.select
,这对于定义多个条件的值非常有用。
import numpy as np
import pandas as pd
df['Comp Salary'] = np.select([df.Salary < df['Avg Salary'], df.Salary == df['Avg Salary'],
df.Salary > df['Avg Salary']], [-1,0,1])
df['Comp Cars'] = np.select([df.Cars < df['Avg Cars'], df.Cars == df['Avg Cars'],
df.Cars > df['Avg Cars']], [-1,0,1])
df['Comp Age'] = np.select([df.Age < df['Avg Age'], df.Age == df['Avg Age'],
df.Age > df['Avg Age']], [1,0,-1])
df['Score'] = df[['Comp Salary', 'Comp Cars', 'Comp Age']].sum(axis=1)