连接具有相同值和不同键的两个字典列表

时间:2019-08-20 19:05:10

标签: python list dictionary join merge

对于需要解决的问题,我需要帮助,但不能使用大熊猫或numpy。我有两个字典列表,即list1和list2。我需要按“ post_code”对list2进行排序,并将其分组 e在通过具有相同值的两个不同键将list1和list2连接在一起之前,按“代码”对list2进行排序。在列表1中,键“实践”等效于已排序的列表2中的键“代码”。我需要使用“练习”和“代码”的等效键来连接list1和list2。

list1=
[{'bnf_code': '0101010G0AAABAB',
  'items': 2,
  'practice': 'N81013',
  'bnf_name': 'Co-Magaldrox_Susp 195mg/220mg/5ml S/F',
  'nic': 5.98,
  'act_cost': 5.56,
  'quantity': 1000},
 {'bnf_code': '0101021B0AAAHAH',
  'items': 1,
  'practice': 'A81001',
  'bnf_name': 'Alginate_Raft-Forming Oral Susp S/F',
  'nic': 1.95,
  'act_cost': 1.82,
  'quantity': 500},
 {'bnf_code': '0101021B0AAALAL',
  'items': 12,
  'practice': 'A81002',
  'bnf_name': 'Sod Algin/Pot Bicarb_Susp S/F',
  'nic': 64.51,
  'act_cost': 59.95,
  'quantity': 6300},
 {'bnf_code': '0101021B0AAAPAP',
  'items': 3,
  'practice': 'A81004',
  'bnf_name': 'Sod Alginate/Pot Bicarb_Tab Chble 500mg',
  'nic': 9.21,
  'act_cost': 8.55,
  'quantity': 180},
 {'bnf_code': '0101021B0BEADAJ',
  'items': 6,
  'practice': 'A81003',
  'bnf_name': 'Gaviscon Infant_Sach 2g (Dual Pack) S/F',
  'nic': 28.92,
  'act_cost': 26.84,
  'quantity': 90}]

list2=
[{'code': 'A81001',
  'name': 'THE DENSHAM SURGERY',
  'addr_1': 'THE HEALTH CENTRE',
  'addr_2': 'LAWSON STREET',
  'borough': 'STOCKTON ON TEES',
  'village': 'CLEVELAND',
  'post_code': 'TS18 1HU'},
 {'code': 'A81002',
  'name': 'QUEENS PARK MEDICAL CENTRE',
  'addr_1': 'QUEENS PARK MEDICAL CTR',
  'addr_2': 'FARRER STREET',
  'borough': 'STOCKTON ON TEES',
  'village': 'CLEVELAND',
  'post_code': 'TS18 2AW'},
 {'code': 'A81003',
  'name': 'VICTORIA MEDICAL PRACTICE',
  'addr_1': 'THE HEALTH CENTRE',
  'addr_2': 'VICTORIA ROAD',
  'borough': 'HARTLEPOOL',
  'village': 'CLEVELAND',
  'post_code': 'TS26 8DB'},
 {'code': 'A81004',
  'name': 'WOODLANDS ROAD SURGERY',
  'addr_1': '6 WOODLANDS ROAD',
  'addr_2': None,
  'borough': 'MIDDLESBROUGH',
  'village': 'CLEVELAND',
  'post_code': 'TS1 3BE'},
 {'code': 'N81013',
  'name': 'SPRINGWOOD SURGERY',
  'addr_1': 'SPRINGWOOD SURGERY',
  'addr_2': 'RECTORY LANE',
  'borough': 'GUISBOROUGH',
  'village': None,
  'post_code': 'TS14 7DJ'}]

我已经能够按post_code排序list2并按代码分组,但是我对如何加入list1和list2迷失了。这是我到目前为止用于排序和分组的代码。

import itertools
from operator import itemgetter
sorted_post_code = sorted(list2, key=itemgetter('post_code'))
for key, group in itertools.groupby(sorted_post_code, key=lambda x:x['code']):
    #print (key),
    print (list(group))

预期的产量是

joined_list=
list1=
[{'bnf_code': '0101010G0AAABAB',
  'items': 2,
  'practice': 'N81013',
  'bnf_name': 'Co-Magaldrox_Susp 195mg/220mg/5ml S/F',
  'nic': 5.98,
  'act_cost': 5.56,
  'quantity': 1000,
  'code': 'N81013',
  'name': 'SPRINGWOOD SURGERY',
  'addr_1': 'SPRINGWOOD SURGERY',
  'addr_2': 'RECTORY LANE',
  'borough': 'GUISBOROUGH',
  'village': None,
  'post_code': 'TS14 7DJ'},
 {'bnf_code': '0101021B0AAAHAH',
  'items': 1,
  'practice': 'A81001',
  'bnf_name': 'Alginate_Raft-Forming Oral Susp S/F',
  'nic': 1.95,
  'act_cost': 1.82,
  'quantity': 500,
  'code': 'A81001',
  'name': 'THE DENSHAM SURGERY',
  'addr_1': 'THE HEALTH CENTRE',
  'addr_2': 'LAWSON STREET',
  'borough': 'STOCKTON ON TEES',
  'village': 'CLEVELAND',
  'post_code': 'TS18 1HU'},
 {'bnf_code': '0101021B0AAALAL',
  'items': 12,
  'practice': 'A81002',
  'bnf_name': 'Sod Algin/Pot Bicarb_Susp S/F',
  'nic': 64.51,
  'act_cost': 59.95,
  'quantity': 6300,
  'code': 'A81002',
  'name': 'QUEENS PARK MEDICAL CENTRE',
  'addr_1': 'QUEENS PARK MEDICAL CTR',
  'addr_2': 'FARRER STREET',
  'borough': 'STOCKTON ON TEES',
  'village': 'CLEVELAND',
  'post_code': 'TS18 2AW'},
 {'bnf_code': '0101021B0AAAPAP',
  'items': 3,
  'practice': 'A81004',
  'bnf_name': 'Sod Alginate/Pot Bicarb_Tab Chble 500mg',
  'nic': 9.21,
  'act_cost': 8.55,
  'quantity': 180,
  'code': 'A81004',
  'name': 'WOODLANDS ROAD SURGERY',
  'addr_1': '6 WOODLANDS ROAD',
  'addr_2': None,
  'borough': 'MIDDLESBROUGH',
  'village': 'CLEVELAND',
  'post_code': 'TS1 3BE'},
 {'bnf_code': '0101021B0BEADAJ',
  'items': 6,
  'practice': 'A81003',
  'bnf_name': 'Gaviscon Infant_Sach 2g (Dual Pack) S/F',
  'nic': 28.92,
  'act_cost': 26.84,
  'quantity': 90,
  'code': 'A81003',
  'name': 'VICTORIA MEDICAL PRACTICE',
  'addr_1': 'THE HEALTH CENTRE',
  'addr_2': 'VICTORIA ROAD',
  'borough': 'HARTLEPOOL',
  'village': 'CLEVELAND',
  'post_code': 'TS26 8DB'}]

2 个答案:

答案 0 :(得分:1)

我了解到,如果字典的键“ code”和“ practice”的值匹配,则希望list1中的每个词典都包含list2中该词典的所有条目。

如果是这样,您可以轻松地用其他词典中的条目更新词典中的所有条目。缺少键:将添加值对,而现有键将更新其值。

所以我最终遇到了double for循环,这是我在进行任何排序之前所做的。您可能要根据需要进行调整。

for entry2 in list2:
    for entry1 in list1:
        if entry2['code'] == entry1['practice']:
            entry1.update(entry2)

可以在以下位置找到有关加入字典的不同方式的很长的解释:https://stackoverflow.com/a/26853961/6218902

答案 1 :(得分:1)

defaultdict对于分组操作而言可能做得相当不错。您可以使用字典来更新分组的元素:

from collections import defaultdict

groups = defaultdict(dict)

# to show this explicitly you can start with two loops
# not the most efficient, but it shows the process
for item in list1:
    k = item['practice']
    groups[k].update(item)

for item in list2:
    k = item['code']
    groups[k].update(item)

# where groups.values() will have your "joined" 
# dictionaries
groups
{
  "N81013": {
    "bnf_code": "0101010G0AAABAB",
    "items": 2,
    "practice": "N81013",
    "bnf_name": "Co-Magaldrox_Susp 195mg/220mg/5ml S/F",
    "nic": 5.98,
    "act_cost": 5.56,
    "quantity": 1000,
    "code": "N81013",
    "name": "SPRINGWOOD SURGERY",
    "addr_1": "SPRINGWOOD SURGERY",
    "addr_2": "RECTORY LANE",
    "borough": "GUISBOROUGH",
    "village": null,
    "post_code": "TS14 7DJ"
  },
  "A81001": {
    "bnf_code": "0101021B0AAAHAH",
    "items": 1,
    "practice": "A81001",
    "bnf_name": "Alginate_Raft-Forming Oral Susp S/F",
    "nic": 1.95,
    "act_cost": 1.82,
    "quantity": 500,
    "code": "A81001",
    "name": "THE DENSHAM SURGERY",
    "addr_1": "THE HEALTH CENTRE",
    "addr_2": "LAWSON STREET",
    "borough": "STOCKTON ON TEES",
    "village": "CLEVELAND",
    "post_code": "TS18 1HU"
  },
  "A81002": {
    "bnf_code": "0101021B0AAALAL",
    "items": 12,
    "practice": "A81002",
    "bnf_name": "Sod Algin/Pot Bicarb_Susp S/F",
    "nic": 64.51,
    "act_cost": 59.95,
    "quantity": 6300,
    "code": "A81002",
    "name": "QUEENS PARK MEDICAL CENTRE",
    "addr_1": "QUEENS PARK MEDICAL CTR",
    "addr_2": "FARRER STREET",
    "borough": "STOCKTON ON TEES",
    "village": "CLEVELAND",
    "post_code": "TS18 2AW"
  },
  "A81004": {
    "bnf_code": "0101021B0AAAPAP",
    "items": 3,
    "practice": "A81004",
    "bnf_name": "Sod Alginate/Pot Bicarb_Tab Chble 500mg",
    "nic": 9.21,
    "act_cost": 8.55,
    "quantity": 180,
    "code": "A81004",
    "name": "WOODLANDS ROAD SURGERY",
    "addr_1": "6 WOODLANDS ROAD",
    "addr_2": null,
    "borough": "MIDDLESBROUGH",
    "village": "CLEVELAND",
    "post_code": "TS1 3BE"
  },
  "A81003": {
    "bnf_code": "0101021B0BEADAJ",
    "items": 6,
    "practice": "A81003",
    "bnf_name": "Gaviscon Infant_Sach 2g (Dual Pack) S/F",
    "nic": 28.92,
    "act_cost": 26.84,
    "quantity": 90,
    "code": "A81003",
    "name": "VICTORIA MEDICAL PRACTICE",
    "addr_1": "THE HEALTH CENTRE",
    "addr_2": "VICTORIA ROAD",
    "borough": "HARTLEPOOL",
    "village": "CLEVELAND",
    "post_code": "TS26 8DB"
  }
}

通常,由于键是唯一的,因此字典非常适​​合分组操作。一个更优化的操作可能是将两个列表一起zip,因为您将进行更新:

from itertools import zip_longest
from collections import defaultdict

groups = defaultdict(dict)


def group_item(a, b):
    a_key, b_key = a['practice'] if a else None, b['code'] if b else None
    return a_key, b_key

for a, b in zip_longest(list1, list2):
    ak, bk = group_item(a, b)
    if ak:
        groups[ak].update(a)
    if bk:
        groups[bk].update(b)

# sort list of groups.values() now
list(groups.values())
[{'bnf_code': '0101010G0AAABAB', 'items': 2, 'practice': 'N81013', 'bnf_name': 'Co-Magaldrox_Susp 195mg/220mg/5ml S/F', 'nic': 5.98, 'act_cost': 5.56, 'quantity': 1000, 'code': 'N81013', 'name': 'SPRINGWOOD SURGERY', 'addr_1': 'SPRINGWOOD SURGERY', 'addr_2': 'RECTORY LANE', 'borough': 'GUISBOROUGH', 'village': None, 'post_code': 'TS14 7DJ'}, {'code': 'A81001', 'name': 'THE DENSHAM SURGERY', 'addr_1': 'THE HEALTH CENTRE', 'addr_2': 'LAWSON STREET', 'borough': 'STOCKTON ON TEES', 'village': 'CLEVELAND', 'post_code': 'TS18 1HU', 'bnf_code': '0101021B0AAAHAH', 'items': 1, 'practice': 'A81001', 'bnf_name': 'Alginate_Raft-Forming Oral Susp S/F', 'nic': 1.95, 'act_cost': 1.82, 'quantity': 500}, {'code': 'A81002', 'name': 'QUEENS PARK MEDICAL CENTRE', 'addr_1': 'QUEENS PARK MEDICAL CTR', 'addr_2': 'FARRER STREET', 'borough': 'STOCKTON ON TEES', 'village': 'CLEVELAND', 'post_code': 'TS18 2AW', 'bnf_code': '0101021B0AAALAL', 'items': 12, 'practice': 'A81002', 'bnf_name': 'Sod Algin/Pot Bicarb_Susp S/F', 'nic': 64.51, 'act_cost': 59.95, 'quantity': 6300}, {'code': 'A81003', 'name': 'VICTORIA MEDICAL PRACTICE', 'addr_1': 'THE HEALTH CENTRE', 'addr_2': 'VICTORIA ROAD', 'borough': 'HARTLEPOOL', 'village': 'CLEVELAND', 'post_code': 'TS26 8DB', 'bnf_code': '0101021B0BEADAJ', 'items': 6, 'practice': 'A81003', 'bnf_name': 'Gaviscon Infant_Sach 2g (Dual Pack) S/F', 'nic': 28.92, 'act_cost': 26.84, 'quantity': 90}, {'bnf_code': '0101021B0AAAPAP', 'items': 3, 'practice': 'A81004', 'bnf_name': 'Sod Alginate/Pot Bicarb_Tab Chble 500mg', 'nic': 9.21, 'act_cost': 8.55, 'quantity': 180, 'code': 'A81004', 'name': 'WOODLANDS ROAD SURGERY', 'addr_1': '6 WOODLANDS ROAD', 'addr_2': None, 'borough': 'MIDDLESBROUGH', 'village': 'CLEVELAND', 'post_code': 'TS1 3BE'}]

我在这里使用zip_longest,如果您的list1list2的长度不相等,则由于大小差异,循环不会被提前截断。要按邮政编码进行排序,请执行与之前相同的操作:

x = sorted(groups.values(), key=operator.itemgetter('post_code'))

但是,这意味着密钥的存在。对于更通用的方法,最好使用lambda并使用带有默认返回值的get

x = sorted(groups.values(), key=lambda x: x.get('post_code', ' '))