python数据帧中最便宜的项目组合

时间:2014-03-21 08:35:43

标签: python pandas

我在python中有这个csv文件

SHOP_ID, COST, ITEM
1, 2.00, A
1, 1.25, B
1, 2.00, C
1, 1.00, D
1, 1.00, "A, B"
1, 1.50, "A, C"
1, 2.50, "A, D"
2, 3.00, A
2, 1.00, B
2, 1.20, C
2, 1.25, D

我已将此文件作为python中的数据框读取。

现在假设我输入A,B,C,D作为输入,并希望从我的数据框中找到最便宜的ITEMS组合用于此用户输入,那么我应该得到: -

SHOP_ID=1
A,B(1.00)+A,C(1.50)+D(1.00) = 3.50

用户将获得A,A,B,C,D,即额外的A,但只要总费用最低,我们就不在乎用户是否将额外的物品作为免费赠品。

我不知道如何解决这个问题。任何帮助都会非常感激。

1 个答案:

答案 0 :(得分:0)

这是一种方法:

def build_shops(shop_text):
    shops = {}
    for item_info in shop_text:
        shop_id,cost,items = item_info.replace(' ', '').split(',')
        cost = float(cost)
        items = items.split('+')

        if shop_id not in shops:
            shops[shop_id] = {}
        shop_dict = shops[shop_id]

        for item in items:
            if item not in shop_dict:
                shop_dict[item] = []
            shop_dict[item].append([cost,items])
    return shops


def solve_one_shop(shop, items):
    if len(items) == 0:
        return [0.0, []]
    for item in items:
        if item not in shop:
            return [float('inf'), []]
    all_possible = []
    first_item = items[0]
    for (price,combo) in shop[first_item]:
        sub_set = [x for x in items if x not in combo]
        price_sub_set,solution = solve_one_shop(shop, sub_set)
        solution.append([price,combo])
        all_possible.append([price+price_sub_set, solution])

    cheapest = min(all_possible, key=(lambda x: x[0]))
    return cheapest


def solver(input_data, required_items):
    shops = build_shops(input_data)
    result_all_shops = []
    for shop_id,shop_info in shops.iteritems():
        (price, solution) = solve_one_shop(shop_info, required_items)
        if price != float('inf'):
            result_all_shops.append([shop_id, price, solution])
    if len(result_all_shops) == 0:
        print('No shop has all required items')
        return
    shop_id,total_price,solution = min(result_all_shops, key=(lambda x: x[1]))
    print('SHOP_ID=%s' % shop_id)
    sln_str = [','.join(items)+'(%0.2f)'%price for (price,items) in solution]
    sln_str = '+'.join(sln_str)
    print(sln_str + ' = %0.2f' % total_price)

测试:

input_data = [
    '1, 2.00, A',
    '1, 1.25, B',
    '1, 2.00, C',
    '1, 1.00, D',
    '1, 1.00, A+B',
    '1, 1.50, A+C',
    '1, 2.50, A+D',
    '2, 3.00, A',
    '2, 1.00, B',
    '2, 1.20, C',
    '2, 1.25, D',
]
required_items = ['A','B','C','D']
solver(input_data, required_items)

输出:

SHOP_ID=1
D(1.00)+A,C(1.50)+A,B(1.00) = 3.50

请注意我使用:

1, 1.00, A+B

而不是

1, 1.00, "A, B"

作为输入格式,只是为了更容易格式化。您可以根据您的格式修改“build_shops”功能。

这个解决方案基本上做:选择项目'A',然后计算set的解('B','C','D')。为了计算解('B','C','D'),它选择'B'并计算集合('C','D')。这是一种分而治之的(http://en.wikipedia.org/wiki/Divide_and_conquer_algorithms)。关键代码是:

    sub_set = [x for x in items if x not in combo]
    price_sub_set,solution = solve_one_shop(shop, sub_set)

为了帮助理解代码,我在这里粘贴“build_shops”的输出:

{'1': {'A': [(2.0, ['A']),
             (1.0, ['A', 'B']),
             (1.5, ['A', 'C']),
             (2.5, ['A', 'D'])],
       'B': [(1.25, ['B']), (1.0, ['A', 'B'])],
       'C': [(2.0, ['C']), (1.5, ['A', 'C'])],
       'D': [(1.0, ['D']), (2.5, ['A', 'D'])]},
 '2': {'A': [(3.0, ['A'])],
       'B': [(1.0, ['B'])],
       'C': [(1.2, ['C'])],
       'D': [(1.25, ['D'])]}}

这个解决方案迭代所有可能的组合,这是蛮力。因此,如果数据集非常大,那么效率会不高。

测试案例2:

input_data = [
    '1, 2.00, burger',
    '1, 1.25, tofu',
    '1, 2.00, tuna',
    '1, 1.00, salad',
    '1, 1.00, burger+tofu',
    '1, 1.50, burger+tuna',
    '1, 2.50, burger+salad',
    '2, 3.00, burger',
    '2, 1.00, tofu',
    '2, 1.20, tuna',
    '2, 1.25, salad',
]
required_items = ['burger','tofu','tuna','salad']
solver(input_data, required_items)

输出2:

SHOP_ID=1
salad(1.00)+burger,tuna(1.50)+burger,tofu(1.00) = 3.50