Question

我有这样的df：

Name   Salary   Age   Cars   Avg Salary   Avg Age   Avg Cars
John    50000    35      1        60000        38          1
 Tom    65000    45      3        60000        38          1

当值较高时，某些列更好，而其他列则相反。所以我创建了两个列表

higher_better = ['Salary', 'Cars']
lower_better = ['Age']

我想比较它们并为它们返回一个分数，所以我定义了一个像这样的新函数：

def compare_higher(a, b):
    return 1 if a > b else 0 if a == b else -1
compare_higher(higher_better, lower_better)

def compare_lower(b, a):
    return 1 if a > b else 0 if a == b else -1
compare_lower(higher_better, lower_better)

我想为比较结果创建新列，然后我可以将它们的分数加在一起。理想输出如下：

Name   Salary   Age   Cars   Avg Salary   Avg Age   Avg Cars   Comp Salary   Comp Age   Comp Cars   Score
John    50000    35      1        60000        38          1        -1             1          0       0
 Tom    65000    45      3        60000        38          1         1            -1          1       1

我有件但不知道如何组合它们。如何比较列中的值并将它们返回到新列中？谢谢你的帮助。

Answer 1

我认为您可以使用多个apply：首先，您可以申请higher_better：

for col in higher_better:
    df['Comp ' + col] = df.apply(lambda row: compare_higher(row[col], row['Avg ' + col]), axis=1)

然后，到lower_better：

for col in lower_better:
    df['Comp ' + col] = df.apply(lambda row: compare_lower(row[col], row['Avg ' + col]), axis=1)

最后，将两者的列合并到score：

comp_col = ['Comp '+ col for col in higher_better+lower_better]
df['score'] = df[comp_col].sum(axis=1)

结果：

   Name  Salary  Age  Cars  Avg Salary  Avg Age  Avg Cars  Comp Salary  \
0  John   50000   35     1       60000       38         1           -1   
1   Tom   65000   45     3       60000       38         1            1   

   Comp Cars  Comp Age  score  
0          0         1      0  
1          1        -1      1

Answer 2

您也可以避免使用np.select，这对于定义多个条件的值非常有用。

import numpy as np
import pandas as pd

df['Comp Salary'] = np.select([df.Salary < df['Avg Salary'], df.Salary == df['Avg Salary'], 
    df.Salary > df['Avg Salary']], [-1,0,1])
df['Comp Cars'] = np.select([df.Cars < df['Avg Cars'], df.Cars == df['Avg Cars'], 
    df.Cars > df['Avg Cars']], [-1,0,1])
df['Comp Age'] = np.select([df.Age < df['Avg Age'], df.Age == df['Avg Age'], 
    df.Age > df['Avg Age']], [1,0,-1])

df['Score'] = df[['Comp Salary', 'Comp Cars', 'Comp Age']].sum(axis=1)

比较不同列中的值

2 个答案: