Question

inventory = {'A':['Toy',3, 1000], 'B':['Toy',8, 1100], 
              'C':['Cloth',15, 1200], 'D':['Cloth',9, 1300], 
               'E':['Toy',11, 1400], 'F':['Cloth', 18, 1500], 'G':['Appliance', 300, 50]}

字母是商品的名称，[]括号中的第一个字段是商品的类别，[]括号中的第二个字段是价格，第三个字段是销售的数字。

我想为每个类别获得最昂贵（价格最高）的商品。如果我在每个类别中没有至少两个商品，我会放弃它。所以我应该得到以下结果。

inventorySummary = {'B':['Toy',8, 1100], 'E':['Toy',11, 1400], 
                     'C':['Cloth',15, 1200], 'F':['Cloth', 18, 1500]}

你能帮我解决一下我可以使用的代码吗？我需要一个我不仅可以用于前两个定价商品，而且还需要三个或四个定价商品。我最终将它用于更大的数据集，所以如果它可能是更通用的代码更好。另外我很难理解lambda表达式，如果你选择提供一个带有lambda表达式的代码，请你解释每个部分是如何工作的，这样我就可以操纵任何变化的需求。

我的系统只提供这些模块：

平分， CMATH，收藏，约会时间， functools， heapq， itertools，数学， numpy的，大熊猫， pytz，队列，随机，回覆， SciPy的， statsmodels， sklearn，利布，时间，溜索

Answer 1

您可以使用itertools.groupby创建群组：

plotOptions: {
  series: {
    stacking: 'normal'
  }
},

实施例

data.forEach(function(val, i) {
    seriesData.push({
      name: categories[i],
      color: colors[(i % colors.length)],
      data: [{
        x: i,
        y: val
      }]
    });
  });

Answer 2

要以最有效的方式获得任何系列的前N个，请使用heapq module。您必须为每个类别创建一个堆：

from heapq import heapify, heappushpop

def summarize_inventory(inventory, top_n=2):
    categories = {}
    for id, info in inventory.items():
        cat, _, sold = info
        heap = categories.setdefault(cat, [])
        if len(heap) < top_n:
            heap.append((sold, id, info))
            if len(heap) == top_n:
                heapify(heap)
        else:
            heappushpop(heap, (sold, id, info))

    # produce the final summary, only include categories with enough items
    return {id: info 
            for cat, heap in categories.items() if len(heap) == top_n
            for sold, id, info in heap}

第一个循环构建堆大小为2，堆积这两个项然后使用heapq.heappushpop() function将下一个项添加到堆中并一步删除最小的那个。

这是一个O（NlogK）解决方案;对于大小为N的输入（输入字典中的键数）和要求前K个元素，堆队列方法需要N次log K步骤来生成解决方案。

如果要将其与O（NlogN）排序解决方案（按类别和价格排序，然后按类别分类）进行比较，那么此解决方案在N增长时的完成时间更短。要获得给定1000个项目的前两个结果，这需要1.000 * 1 == 1.000步。排序需要1.000 * 10 == 10.000步。对于1.000.000输入，这将变为100万对10亿步等等。

对于您生成的给定广告资源：

>>> summarize_inventory(inventory)
{'B': ['Toy', 8, 1100], 'E': ['Toy', 11, 1400], 'D': ['Cloth', 9, 1300], 'F': ['Cloth', 18, 1500]}
>>> from pprint import pprint
>>> pprint(_)
{'B': ['Toy', 8, 1100],
 'D': ['Cloth', 9, 1300],
 'E': ['Toy', 11, 1400],
 'F': ['Cloth', 18, 1500]}

该功能适用于您可能关注的任何前N：

>>> summarize_inventory(inventory, 3)
{'A': ['Toy', 3, 1000], 'C': ['Cloth', 15, 1200], 'B': ['Toy', 8, 1100], 'E': ['Toy', 11, 1400], 'D': ['Cloth', 9, 1300], 'F': ['Cloth', 18, 1500]}
>>> summarize_inventory(inventory, 1)
{'E': ['Toy', 11, 1400], 'G': ['Appliance', 300, 50], 'F': ['Cloth', 18, 1500]}

Answer 3

enrico.bacis打败了我itertools.groupby解决方案，但如果它可以帮助你，那么这就是我的版本（我尝试做FP风格）：

def summarize_inventory(inventory, top_n=2):
    sort_key = lambda (id, (category, price, sold)): (category, price)
    group_key = lambda (id, (category, price, sold)): category

    # items in inventory grouped by their category
    items_by_category = (
        (category, list(items))
        for category, items in itertools.groupby(
            sorted(inventory.iteritems(), key=sort_key),
            group_key
        )
    )

    # the top_n items from each category if there are >= top_n items
    inventory_summary = dict(itertools.chain.from_iterable(
        items[-1 * top_n:]
        for category, items in items_by_category
        if len(items) >= top_n
    ))

    return inventory_summary

总结两个选项的字典

3 个答案:

实施例