我正在寻找列的平均分数'得分' 加权重量'对于所有子范围:行0-1,0-2 ......,1-2,1-3 ......,2-3,2-4 ......等。
预期结果将是具有最高平均值的子范围。
df2 = pd.DataFrame(
{'Weight': (2, 3, 4, 5, 2, 3, 4, 5),
'Score': (6, 7, 8, 9, 6, 7, 8, 9)})
print(df2)
Score Weight
0 6 2
1 7 3
2 8 4
3 9 5
4 6 2
5 7 3
6 8 4
7 9 5
答案 0 :(得分:2)
您可以在此处使用列表或生成器表达式(更喜欢后者)。
见下文:
# create column with weighted scores
df2["Weighted"] = df2["Score"] * df2["Weight"]
# create helper function for averaging
average = lambda indices: df2.loc[indices, "Weighted"].mean()
# generate all possible ranges
length = df2.shape[0] + 1
ranges = (range(start, end)
for start in range(length)
for end in range(start + 1, length))
# generate all averages
averages = ((indices, average(indices)) for indices in ranges)
# get highest average with value
high_range, high_value = max(averages, key=lambda x: x[1])
# show result
print("Range:", list(high_range), "Avg:", high_value)
Range: [3] Avg: 45.0
请注意,您的数据框需要以0开头的排序整数索引。否则,此解决方案无法正常工作,因为它使用range
来爆炸索引的结构。
更详细地解释一下。仔细查看生成的范围:
ranges = (range(start, end)
for start in range(length)
for end in range(start + 1, length))
print([list(x) for x in ranges])
[[0],
[0, 1],
[0, 1, 2],
[0, 1, 2, 3],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5, 6],
[0, 1, 2, 3, 4, 5, 6, 7],
[1],
[1, 2],
[1, 2, 3],
[1, 2, 3, 4],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6, 7],
[2],
[2, 3],
[2, 3, 4],
[2, 3, 4, 5],
[2, 3, 4, 5, 6],
[2, 3, 4, 5, 6, 7],
[3],
[3, 4],
[3, 4, 5],
[3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4],
[4, 5],
[4, 5, 6],
[4, 5, 6, 7],
[5],
[5, 6],
[5, 6, 7],
[6],
[6, 7],
[7]]
并且在平均值:
ranges = (range(start, end)
for start in range(length)
for end in range(start + 1, length))
averages = ((indices, average(indices)) for indices in ranges)
print([list(x) for x in averages])
[[range(0, 1), 12.0],
[range(0, 2), 16.5],
[range(0, 3), 21.666666666666668],
[range(0, 4), 27.5],
[range(0, 5), 24.399999999999999],
[range(0, 6), 23.833333333333332],
[range(0, 7), 25.0],
[range(0, 8), 27.5],
[range(1, 2), 21.0],
[range(1, 3), 26.5],
[range(1, 4), 32.666666666666664],
[range(1, 5), 27.5],
[range(1, 6), 26.199999999999999],
[range(1, 7), 27.166666666666668],
[range(1, 8), 29.714285714285715],
[range(2, 3), 32.0],
[range(2, 4), 38.5],
[range(2, 5), 29.666666666666668],
[range(2, 6), 27.5],
[range(2, 7), 28.399999999999999],
[range(2, 8), 31.166666666666668],
[range(3, 4), 45.0],
[range(3, 5), 28.5],
[range(3, 6), 26.0],
[range(3, 7), 27.5],
[range(3, 8), 31.0],
[range(4, 5), 12.0],
[range(4, 6), 16.5],
[range(4, 7), 21.666666666666668],
[range(4, 8), 27.5],
[range(5, 6), 21.0],
[range(5, 7), 26.5],
[range(5, 8), 32.666666666666664],
[range(6, 7), 32.0],
[range(6, 8), 38.5],
[range(7, 8), 45.0]]
要获得所有最大范围(不只是一个),您需要稍微修改代码。因为我们必须在averages
上迭代两次(首先找到最大平均值,然后将每个平均值与最大平均值进行比较),我将其转换为列表理解。
# generate all averages
averages = [(indices, df2.loc[indices, "Weighted"].mean())
for indices in ranges]
max_average = max(averages, key=lambda x: x[1])[1]
highest = [tuples for tuples in averages if tuples[1] == max_average]
print(highest)
[(range(3, 4), 45.0), (range(7, 8), 45.0)]