取一个数据框的一行加起来等于给定值

时间:2018-11-03 16:46:08

标签: python dataframe optimization

我有以下数据框:

enter image description here

a的用户必须从id列中获取用户,直到总值总计为给定值,例如 14 。如何选择行以有效地满足此条件?

这是我用于示例的数据:

{'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10},
 'valor': {0: 5.690813772729765,
  1: 5.502473982705203,
  2: 7.341171631905721,
  3: 6.792634352953639,
  4: 3.3972025109972535,
  5: 3.417867922325758,
  6: 7.336228970419381,
  7: 0.048008919685266216,
  8: 2.365638019103776,
  9: 0.9593678139592221}}

2 个答案:

答案 0 :(得分:2)

您可以使用以下方法找到最接近的值:

def options(valor, i, total, maximum, lowest, lst_ids):
    if total > maximum:
        return total, lst_ids
    ids = ''
    for j in range(i, len(valor)):
        lst_ids += ', ' + str(valor[j][1])
        new_score, new_ids = options(valor, j + 1, total + valor[j][0], maximum, lowest, lst_ids)
        lst_ids = lst_ids.replace(', ' + str(valor[j][1]), '')
        if new_score < lowest:
            lowest = new_score
            ids = new_ids
    return lowest, ids


data = {'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10},
        'valor': {0: 5.690813772729765, 1: 5.502473982705203, 2: 7.341171631905721, 3: 6.792634352953639,
                  4: 3.3972025109972535, 5: 3.417867922325758, 6: 7.336228970419381, 7: 0.048008919685266216,
                  8: 2.365638019103776, 9: 0.9593678139592221}}

valor = [(data['valor'][i], data['id'][i]) for i in data['valor']]
closest_score, ids = options(valor, 0, 0, 14, 1e10, '')
ids = ids[2:]
print(closest_score, ids)

这将返回14.034419476793634 1, 7, 8, 10,第一部分是最小值,第二部分是获得此分数所需的ID。如果您想将实际数字获取为整数,则可以使用:

ids = [int(i) for i in ids.split(', ')]

答案 1 :(得分:0)

这是使用带有itertools.combinations的生成器的蛮力解决方案:

from itertools import chain, combinations
from operator import itemgetter

d = dict(zip(d['id'].values(), d['valor'].values()))  # restructure dictionary

def gen_ids_sum(d):
    for id_tup in chain.from_iterable(combinations(d, i) for i in range(1, len(d))):
        yield id_tup, sum(map(d.__getitem__, id_tup))

ids, val_sum = min(gen_ids_sum(d), key=lambda x: (x[1] <= 14, abs(x[1] - 14)))

print(ids, val_sum)

(1, 7, 8, 10) 14.034419476793634

设置

d = {'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10},
     'valor': {0: 5.690813772729765, 1: 5.502473982705203, 2: 7.341171631905721,
               3: 6.792634352953639, 4: 3.3972025109972535, 5: 3.417867922325758,
               6: 7.336228970419381, 7: 0.048008919685266216, 8: 2.365638019103776,
               9: 0.9593678139592221}}