获取排序列表中特定键的前n个词典

时间:2016-12-09 18:11:22

标签: python list dictionary

我编写了一个脚本,用于计算订单的送货地址与特定连锁店的每个商店位置之间的距离。到目前为止,我已经创建了一个排序的字典列表(按order_id排序,然后按距离排序)。它看起来像这样:

   [
       {
            "order_id": 1,
            "distance": 10,
            "storeID": 1112
        },
        {
            "order_id": 1,
            "distance": 20,
            "storeID": 1116
        },
        {
            "order_id": 1,
            "distance": 30,
            "storeID": 1134
        },
        {
            "order_id": 1,
            "distance": 40,
            "storeID": 1133
        },
        {
            "order_id": 2,
            "distance": 6,
            "storeID": 1112
        },
        {
            "order_id": 2,
            "distance": 12,
            "storeID": 1116
        },
        {
            "order_id": 2,
            "distance": 18,
            "storeID": 1134
        },
        {
            "order_id": 2,
            "distance": 24,
            "storeID": 1133
        }
    ]

从这里,我想找到每个order_id最近的两个商店,以及它们的距离。

我最终想要得到的是一个如下所示的列表:

   [
       {
            "order_id": 1,
            "closet_store_distance": 10,
            "closest_store_id": 1112,
            "second_closet_store_distance": 20,
            "second_closest_store_id": 1116
       },
       {
            "order_id": 2,
            "closet_store_distance": 6,
            "closest_store_id": 1112,
            "second_closet_store_distance": 12,
            "second_closest_store_id": 1116
      }
]

我不确定如何遍历此列表中的每个order_id并选择两个最近的商店。任何帮助表示赞赏。

2 个答案:

答案 0 :(得分:0)

尝试这样的事情,我假设初始数据位于名为sample.txt的文件中。

import json
from operator import itemgetter

def make_order(stores, id):
   return {
      "order_id": id,
      "closet_store_distance": stores[0][1],
      "closest_store_id": stores[0][0],
      "second_closet_store_distance": stores[1][1],
      "second_closest_store_id": stores[1][0]
   }

def main():
   with open('sample.txt', 'r') as data_file:
      data = json.loads(data_file.read())

   id1 = {}
   id2 = {}
   for i in data:
      if i["order_id"] == 1:
         id1[i["storeID"]] = i["distance"]
      else:
         id2[i["storeID"]] = i["distance"]

   top1 = sorted(id1.items(), key=itemgetter(1))
   top2 = sorted(id2.items(), key=itemgetter(1))

   with open('results.json', 'w') as result_file:
      order1 = make_order(top1, 1)
      order2 = make_order(top2, 2)
      json.dump([order1, order2], result_file, indent=3, separators=(',', ': '))

if __name__ == '__main__':
   main()

生成的文件如下所示:

[
   {
      "second_closest_store_id": 1116,
      "closet_store_distance": 10,
      "closest_store_id": 1112,
      "order_id": 1,
      "second_closet_store_distance": 20
   },
   {
      "second_closest_store_id": 1116,
      "closet_store_distance": 6,
      "closest_store_id": 1112,
      "order_id": 2,
      "second_closet_store_distance": 12
   }
]

答案 1 :(得分:0)

一个很好的可读答案(但使用我的免费库之一。):

web.config

此示例假定生产from PLOD import PLOD order_store_list = [ { "order_id": 1, "distance": 10, "storeID": 1112 }, { "order_id": 1, "distance": 20, "storeID": 1116 }, { "order_id": 1, "distance": 30, "storeID": 1134 }, { "order_id": 1, "distance": 40, "storeID": 1133 }, { "order_id": 2, "distance": 6, "storeID": 1112 }, { "order_id": 2, "distance": 12, "storeID": 1116 }, { "order_id": 2, "distance": 18, "storeID": 1134 }, { "order_id": 2, "distance": 24, "storeID": 1133 } ] # # first, get the order_ids (place in a dictionary to ensure uniqueness) # order_id_keys = {} for entry in order_store_list: order_id_keys[entry["order_id"]] = True # # next, get the two closest stores per order_id # closest_stores = [] for order_id in order_id_keys: top_two = PLOD(order_store_list).eq("order_id", order_id).sort("distance").returnList(limit=2) closest_stores.append({ "order_id": order_id, "closet_store_distance": top_two[0]["distance"], "closest_store_id": top_two[0]["storeID"], "second_closet_store_distance": top_two[1]["distance"], "second_closest_store_id": top_two[1]["storeID"] }) # # sort by order_id again (if that is important) # closest_stores = PLOD(closest_stores).sort("order_id").returnList() 适合内存。如果您使用的是更大的数据集,我强烈建议您为该数据库使用数据库和python库。

我的PLOD库是免费的开源(MIT),但需要Python 2.7。我离Python 3.5发布大约两周了。见https://pypi.python.org/pypi/PLOD/0.1.7