目标: 希望以编程方式匹配两列中的组合,以查找另一列的最低值
我们说我有这个:
import pandas as pd
d = {'Part_1': [91, 201, 201],
'Part_2': [201,111,91],
'Result': [3,3, 3],
'Sub-Score': [0.60, 0.8,0.9],
'Final-Score': [0,0,0]}
df = pd.DataFrame(data=d)
df
我想找到可以分配到Final-Score列的子得分列中的最小值。我需要根据匹配的Part_1和Part_2(对于任一部分可以处于不同的位置)进行选择:
d_new = {'Part_1': [91, 201, 201],
'Part_2': [201,111,91],
'Result': [3,3, 3],
'Sub-Score': [0.60, 0.8,0.9],
'Final-Score': [0.6,.8,.6]}
df_new = pd.DataFrame(data=d_new)
df_new
在这里,我们可以看到第0行和第2行在 Part_1 列中具有相同的值 strong> Part_2 ,它们完全没有问题。此外,我们可以看到第0行的亚分数值为0.60,第2行的亚分数值为0.9。
我希望从第0行中分配子得分值(因为它是第0行和第2行中的最低值)到第0行和第2行的最终得分列。 第1行无法与第0行和第2行进行比较,也没有相同的部分,因此我们继续使用其子得分值到最终得分值。
任何帮助都将不胜感激。
(适用编者):
输入:
Final-Score Part_1 Part_2 Result Sub-Score
0 0 91 201 3 0.6
1 0 201 111 3 0.8
2 0 201 91 3 0.9
期望的输出:
Final-Score Part_1 Part_2 Result Sub-Score
0 0.6 91 201 3 0.6
1 0.8 201 111 3 0.8
2 0.6 201 91 3 0.9
答案 0 :(得分:2)
对值进行排序,然后根据ngroup和变换min进行分组,即
{{1}}
答案 1 :(得分:0)
我找到了一种似乎有用的(有点hacky)方式:
import pandas as pd
d = {'Part_1': [91, 201, 201],
'Part_2': [201, 111, 91],
'Result': [3, 3, 3],
'Sub-Score': [0.60, 0.8, 0.9],
'Final-Score': [0, 0, 0]}
df = pd.DataFrame(data=d)
# Find lowest part-number of part-pair and add as new column
df["min_part"] = df[["Part_1", "Part_2"]].min(axis=1)
# Find highest part-number of part-pair and add as new column
df["max_part"] = df[["Part_1", "Part_2"]].max(axis=1)
print df
现在数据集如下:
Final-Score Part_1 Part_2 Result Sub-Score min_part max_part
0 0 91 201 3 0.6 91 201
1 0 201 111 3 0.8 111 201
2 0 201 91 3 0.9 91 201
然后做:
# Group together rows with the same min_part, max_part pair, and assign
# their lowest "Sub-Score" value to the "Final-score" column
df["Final-Score"] = df.groupby(["min_part", "max_part"])["Sub-Score"].transform("min")
print df
最终结果:
Final-Score Part_1 Part_2 Result Sub-Score min_part max_part
0 0.6 91 201 3 0.6 91 201
1 0.8 201 111 3 0.8 111 201
2 0.6 201 91 3 0.9 91 201
(可选)仅保留原始列:
df = df[["Final-Score", "Part_1", "Part_2", "Result", "Sub-Score"]]
print df
结果:
Final-Score Part_1 Part_2 Result Sub-Score
0 0.6 91 201 3 0.6
1 0.8 201 111 3 0.8
2 0.6 201 91 3 0.9
答案 2 :(得分:0)
我也会过一张临时桌子。首先生成一个密钥,然后分组该密钥并应用min():
# Generate a key that does not depend
# on the order of the values in Part_1 and Part_2
df['key'] = [str(set(i)) for i in list(df[['Part_1', 'Part_2']].values)]
# Generate temporary table that contains keys and minimal values
tmp = df.groupby('key').min()['Sub-Score']
scores = {}
for key, val in zip(tmp.index, tmp.values):
scores[key] = val
# Place the minimal values in the original table
df.loc[:, 'Final-Score'] = [scores[key] for key in df.key]
# Finally, delete what you don't need
del df['key'], tmp
df
> Final-Score Part_1 Part_2 Result Sub-Score
>0 0.6 91 201 3 0.6
>1 0.8 201 111 3 0.8
>2 0.6 201 91 3 0.9