Question

在五星评级系统中，我有一个已知数量的评级N（投票）我也有所有N个等级的最终（加权）平均值，让我们说它是R（浮动到小数点后两位）。
我想知道生成所有可能组合（加权平均值的总和）的最佳方法（算法），并且只打印出导致R的一个（打印）＆＃34; ALL＆＃34;可能的组合不是我想要的，因为对于大N和小R来说，它将运行数百亿。我从一个蟒蛇新手中删除了一步，但它将是选择的语言，这非常练习是我对语言的介绍。解决这个问题的最佳方法是什么？方法是我的问题，但任何代码的提示都非常受欢迎。

示例：

N = 20位客户评价产品
R = 3.85是平均评级

输出： [14,1,0,0,5] 146是一种可能的组合。＆＃34; 14个五星级，0个四星级，0个三星级，1个2星级和5个一星级＆＃34;

组合： [487,0,1,0,12] 是1154中N = 500和R = 4.90等的一种可能的组合。

Answer 1

（低效）暴力解决方案。

注意：通过将product(range(0, N+1), repeat=5)替换为可以生成总计为N的5个数字列表的其他内容，可以提高效率。

查找长度为5（对于评级）最多为N的所有列表，然后计算加权平均值并与R进行比较。打印列表

from itertools import product

def weighed_avergage(l, total):
    if sum(l) != total:
        return 0
    return sum(rating * stars for rating, stars in zip(l, range(5, 0, -1))) / float(total)

N = 20
R = 3.85

for p in product(range(0, N+1), repeat=5):
    w_avg = weighed_avergage(p, N)
    if w_avg == R:
        print p

应该在输出中看到(10, 4, 1, 3, 2)，这对应于你的10个五星，4个四星，1个三星，3个2星和2个一星的问题

Answer 2

您可以使用递归算法枚举所有投票分布，然后检查哪些分数具有正确的加权平均值。但请注意，组合的数量增长很快。

def distributions(ratings, remaining):
    if len(ratings) > 1:
        # more than one rating: take some for first and distribute rest
        for n in range(remaining+1):
            for dist in distributions(ratings[1:], remaining - n):
                yield [(ratings[0], n)] + dist
    elif len(ratings) == 1:
        # only one rating left -> all remaining go there
        yield [(ratings[0], remaining)]
    else:
        # should never happen
        raise ValueError("No more ratings left")

def weighted_avg(votes):
    return sum(r * n for r, n in votes) / sum(n for r, n in votes)

for dist in distributions([1, 2, 3, 4, 5], 20):
    if weighted_avg(dist) == 3.85:
        print(dist)

总共有10626个分布，146个分布产生正确的平均值。输出（一些）：

[(1, 0), (2, 0), (3, 3), (4, 17), (5, 0)]
[(1, 0), (2, 0), (3, 4), (4, 15), (5, 1)]
...
[(1, 2), (2, 3), (3, 1), (4, 4), (5, 10)]
...
[(1, 5), (2, 0), (3, 1), (4, 1), (5, 13)]
[(1, 5), (2, 1), (3, 0), (4, 0), (5, 14)]

Answer 3

星星的总数是N * R（在您的示例中为20 * 3.85 = 77）。现在你有类似于the change making problem的东西，除了你的硬币总数是固定的。

一个有效的解决方案可能是从尽可能多的大硬币（5星评级）开始，不会超过总数，并减少直到你的评级不会达到总数。您仍然最终检查不起作用的解决方案，但它比检查所有解决方案要快得多，特别是对于大问题规模。

这是我的解决方案:(编辑：调试解决方案。我不认为它是最佳解决方案，但它比蛮力更好.1931递归调用N = 20 R = 3.85的样本案例）

def distribution(total, maxRating, N, solution):
    if total == 0 and N == 0:
        return [solution + [0] * maxRating] #we found a solution

    if total == 0 or N == 0:
        return [] # no solution possible

    largestUpperLimit = min(total // maxRating, N) # an upper limit for the number of reviews with the largest rating
    largestLowerLimit = max((total - N * (maxRating -1)) // maxRating, 0) # a lower limit for the number of reviews with the largest rating

    if N < largestLowerLimit:
        return [] # there aren't enough ratings to make any solutions
    else:
        solutions = []
        for i in range(largestLowerLimit, largestUpperLimit + 1): # plus 1 to include the upper limit
            solutions.extend(distribution(total - i * maxRating, maxRating - 1, N - i, solution + [i]))
        return solutions


# Using the function example:
solutions = distribution(N * R, 5, N, [])

Answer 4

这是一种不使用蛮力的算法。缩进代码显示了一个示例。

您有等级N的数量及其平均值让si为i-stars评分的数量（i in [1..5]）我们有s1 + s2 + s3 + s4 + s5 = N 我们还有s1 + 2*s2 + 3*s3 + 4*s4 + 5*s5 = R*N。

  s1 + s2 + s3 + s4 + s5 = 20
  s1 + 2*s2 + 3*s3 + 4*s4 + 5*s5 = 77

因此s2 + 2*s3 + 3*s4 + 4*s5 = R*N - N 现在选择s1的值并计算s2 + s3 + s4 + s5 = N - s1。

  s2 + 2*s3 + 3*s4 + 4*s5 = 57
  s1 = 4
  s2 + s3 + s4 + s5 = 16

我们可以继续s3 + 2*s4 + 3*s5 = (R*N - N) - (N - s1) 选择s2的值并计算s3 + s4 + s5 = N - s1 - s2。

  s3 + 2*s4 + 3*s5 = 41
  s2 = 2
  s3 + s4 + s5 = 14

重复s3并获取s5和s4的值。

  s4 + 2*s5 = 27
  s3 = 9
  s4 + s5 = 5

  s5 = 22
  s4 = -17

现在，显然，这会产生不正确的解决方案（在示例中，s5 > 20和s4 < 0）。为了避免这种情况，我们每次都可以约束价值选择我们需要选择s3以便s4 + s5 >= (s4 + 2*s5)/2，以便最终得到s5 <= s4 + s5。
这只有在s3 + s4 + s5 >= (s3 + 2*s4 + 3*s5)/3时才有可能，因此另一个约束，这次是s2 最后，s1的约束是s2 + s3 + s4 + s5 >= (s2 + 2*s3 + 3*s4 + 4*s5)/4。

制作反向五星评级计算器的最佳方法

4 个答案: