我编写了一个脚本,用于计算订单的送货地址与特定连锁店的每个商店位置之间的距离。到目前为止,我已经创建了一个排序的字典列表(按order_id排序,然后按距离排序)。它看起来像这样:
[
{
"order_id": 1,
"distance": 10,
"storeID": 1112
},
{
"order_id": 1,
"distance": 20,
"storeID": 1116
},
{
"order_id": 1,
"distance": 30,
"storeID": 1134
},
{
"order_id": 1,
"distance": 40,
"storeID": 1133
},
{
"order_id": 2,
"distance": 6,
"storeID": 1112
},
{
"order_id": 2,
"distance": 12,
"storeID": 1116
},
{
"order_id": 2,
"distance": 18,
"storeID": 1134
},
{
"order_id": 2,
"distance": 24,
"storeID": 1133
}
]
从这里,我想找到每个order_id
最近的两个商店,以及它们的距离。
我最终想要得到的是一个如下所示的列表:
[
{
"order_id": 1,
"closet_store_distance": 10,
"closest_store_id": 1112,
"second_closet_store_distance": 20,
"second_closest_store_id": 1116
},
{
"order_id": 2,
"closet_store_distance": 6,
"closest_store_id": 1112,
"second_closet_store_distance": 12,
"second_closest_store_id": 1116
}
]
我不确定如何遍历此列表中的每个order_id并选择两个最近的商店。任何帮助表示赞赏。
答案 0 :(得分:0)
尝试这样的事情,我假设初始数据位于名为sample.txt
的文件中。
import json
from operator import itemgetter
def make_order(stores, id):
return {
"order_id": id,
"closet_store_distance": stores[0][1],
"closest_store_id": stores[0][0],
"second_closet_store_distance": stores[1][1],
"second_closest_store_id": stores[1][0]
}
def main():
with open('sample.txt', 'r') as data_file:
data = json.loads(data_file.read())
id1 = {}
id2 = {}
for i in data:
if i["order_id"] == 1:
id1[i["storeID"]] = i["distance"]
else:
id2[i["storeID"]] = i["distance"]
top1 = sorted(id1.items(), key=itemgetter(1))
top2 = sorted(id2.items(), key=itemgetter(1))
with open('results.json', 'w') as result_file:
order1 = make_order(top1, 1)
order2 = make_order(top2, 2)
json.dump([order1, order2], result_file, indent=3, separators=(',', ': '))
if __name__ == '__main__':
main()
生成的文件如下所示:
[
{
"second_closest_store_id": 1116,
"closet_store_distance": 10,
"closest_store_id": 1112,
"order_id": 1,
"second_closet_store_distance": 20
},
{
"second_closest_store_id": 1116,
"closet_store_distance": 6,
"closest_store_id": 1112,
"order_id": 2,
"second_closet_store_distance": 12
}
]
答案 1 :(得分:0)
一个很好的可读答案(但使用我的免费库之一。):
web.config
此示例假定生产from PLOD import PLOD
order_store_list = [
{
"order_id": 1,
"distance": 10,
"storeID": 1112
},
{
"order_id": 1,
"distance": 20,
"storeID": 1116
},
{
"order_id": 1,
"distance": 30,
"storeID": 1134
},
{
"order_id": 1,
"distance": 40,
"storeID": 1133
},
{
"order_id": 2,
"distance": 6,
"storeID": 1112
},
{
"order_id": 2,
"distance": 12,
"storeID": 1116
},
{
"order_id": 2,
"distance": 18,
"storeID": 1134
},
{
"order_id": 2,
"distance": 24,
"storeID": 1133
}
]
#
# first, get the order_ids (place in a dictionary to ensure uniqueness)
#
order_id_keys = {}
for entry in order_store_list:
order_id_keys[entry["order_id"]] = True
#
# next, get the two closest stores per order_id
#
closest_stores = []
for order_id in order_id_keys:
top_two = PLOD(order_store_list).eq("order_id", order_id).sort("distance").returnList(limit=2)
closest_stores.append({
"order_id": order_id,
"closet_store_distance": top_two[0]["distance"],
"closest_store_id": top_two[0]["storeID"],
"second_closet_store_distance": top_two[1]["distance"],
"second_closest_store_id": top_two[1]["storeID"]
})
#
# sort by order_id again (if that is important)
#
closest_stores = PLOD(closest_stores).sort("order_id").returnList()
适合内存。如果您使用的是更大的数据集,我强烈建议您为该数据库使用数据库和python库。
我的PLOD库是免费的开源(MIT),但需要Python 2.7。我离Python 3.5发布大约两周了。见https://pypi.python.org/pypi/PLOD/0.1.7