如何用动态密钥有效地处理Python字典?

时间:2017-01-15 12:42:03

标签: python python-3.x dictionary opendata

如何使用动态密钥有效处理Python字典?

我使用荷兰的开放数据。每个区域/年都有一本字典。字典键每年都不同。如何编写处理此问题的有效代码?

我有两个工作结构,如下面的示例所示:但是两个都需要为每个键做出努力,并且开放数据中有108个键,所以我真的希望Python提供一个我还不知道的更好的解决方案!

关于开放数据的FYI: 每年都有一个包含16194个词典的列表。 NL中每个邻域一个字典。每个字典有108个项目(键,值对):

>>> import cbsodata
>>> table = '83487NED'
>>> data = cbsodata.get_data(table, dir=None, typed=False)
Retrieving data from table '83487NED'
Done!
>>> len(data)
16194
>>> data[0]
{'Gehuwd_14': 1565, 'MateVanStedelijkheid_105': 5, 'Bevolkingsdichtheid_33':   1350, 'Gemeentenaam_1': 'Aa en Hunze                             ', ... etc     
>>> len(data[0])
108

密钥可能是一年内的“Code_3”和明年的“Code_4”......

用于示例解决方案的示例数据:

data2016 = [{'Code_3': 'BU01931000', 'ZipCode_106': '2251MT', 'City_12': 'Amsterdam', 'Number_of_people_5': '24000'},
                {'Code_3': 'BU02221000', 'ZipCode_106': '2851MT', 'City_12': 'London', 'Number_of_people_5': '88000'},
                {'Code_3': 'BU04444000', 'ZipCode_106': '2351MT', 'City_12': 'Paris', 'Number_of_people_5': '133000'}]
data2015 = [{'Code_4': 'BU01931000', 'ZipCode_106': '2251MT', 'City_12': 'Amsterdam', 'Number_of_people_6': '22000'},
                {'Code_4': 'BU02221000', 'ZipCode_106': '2851MT', 'City_12': 'London', 'Number_of_people_6': '86000'},
                {'Code_4': 'BU04444000', 'ZipCode_106': '2351MT', 'City_12': 'Paris', 'Number_of_people_6': '131000'}]
data2014 = [{'Code_8': 'BU01931000', 'ZipCode_109': '2251MT', 'City_12': 'Amsterdam', 'Number_of_people_14': '18000'},
                {'Code_8': 'BU02221000', 'ZipCode_109': '2851MT', 'City_12': 'London', 'Number_of_people_14': '76000'},
                {'Code_8': 'BU04444000', 'ZipCode_109': '2351MT', 'City_12': 'Paris', 'Number_of_people_14': '129000'}]
data2013 = [{'Code_8': 'BU01931000', 'ZipCode_109': '2251MT', 'City_12': 'Amsterdam', 'Number_of_people_14': '14000'},
                {'Code_8': 'BU02221000', 'ZipCode_109': '2851MT', 'City_12': 'London', 'Number_of_people_14': '74000'}] # data for Paris 'BU04444000' missing in 2013
tables = {2013: data2013, 2014: data2014, 2015: data2015, 2016: data2016}
years = [2013, 2014, 2015, 2016]
current_year = 2016

示例解决方案1,键映射:

def CBSkey(key, year):
    if key == 'key_code':
        if year == 2013:
            return('Code_8')
        elif year == 2014:
            return('Code_8')
        elif year == 2015:
            return('Code_4')
        elif year == 2016:
            return('Code_3')
    elif key == 'key_people':
        if year == 2013:
            return('Number_of_people_14')
        elif year == 2014:
            return('Number_of_people_14')
        elif year == 2015:
            return('Number_of_people_6')
        elif year == 2016:
            return('Number_of_people_5')

for record_now in tables[current_year]:
    code = record_now['Code_3']
    city = record_now['City_12']
    people = {}
    for year in years:
        code_year = CBSkey('key_code', year)
        people_year = CBSkey('key_people', year)
        for record in tables[year]:
            if record[code_year] == code:
                people[year] = (record[people_year])

    print(people)

所有3个示例解决方案的输出:

{2016: '24000', 2013: '14000', 2014: '18000', 2015: '22000'}
{2016: '88000', 2013: '74000', 2014: '76000', 2015: '86000'}
{2016: '133000', 2014: '129000', 2015: '131000'}

示例2,根据项目选择正确的字典,然后遍历所有其他键以查找其他数据:

for record_now in tables[current_year]:
    city = record_now['City_12']
    code = record_now['Code_3']
    print('Code: ', code)
    people = {}
    for year in years:
        for record in tables[year]:
            for v in record.values():
                if v == code:
                    for k in record.keys():
                        key_type = CBSkey(k)
                        if key_type == 'People_type':
                            people[year] = (record[k])
    print(people)

希望有一些明亮的'Pythonic'想法,非常感谢提前!

1 个答案:

答案 0 :(得分:1)

如果我正确理解了这个数据集,那么每年的数据都是一个列表 许多决定;给定年份的所有dicts使用相同的密钥;该 密钥每年都有所不同,但可用的一般数据是相同的。 因此,您需要一种方法来有效地从多个数据中检索相同的数据 年。

首先,我会把所有这些年都放在一个大字典中,而不是使用 您拥有的间接映射方案:

data = {}
data[2016] = [{'Code_3': 'BU01931000'}] # etc.
data[2015] = [{'Code_4': 'BU01931000'}] # etc.

所以tables和所有个人datayyyy都消失了,tables[year] 变为data[year]years变为data.keys()

然后,我会计算出从年份到键的映射。

"""ytok structure

ytok maps years to dicts of keys. ytok[2016] would be:
{'code': 'Code_3', 'zip': 'ZipCode_106', 'city': 'City_12',
 'people': 'Number_of_people_5'}
"""

这是构建ytok的一种方法,显示中间结果 使过程清晰:

ytok = {}

for year in data.keys():
    sample = data[year][0]
    outputs = list(sorted(sample.keys()))
    # Will be in this order: city, code, people, zip
    inputs = 'city code people zip'.split()
    pairs = list(zip(inputs, outputs))
    print(pairs)
    yeardict = dict(pairs)
    print(yeardict)
    ytok[year] = yeardict

print(ytok)

这是一种更简化的方式:

inputs = 'city code people zip'.split()
for year in data.keys():
    outputs = sorted(data[year][0].keys())
    ytok[year] = dict(zip(inputs, outputs))

print(ytok)

然后像这样使用ytok

wanted_code = 'BU02221000'
people = {}
for year in data.keys():
    codekey = ytok[year]['code']
    peoplekey = ytok[year]['people']
    for record in data[year]:
        if record[codekey] == wanted_code:
            people[year] = record[peoplekey]
            break

print(people)

注意找到正确记录后使用break。有 一旦我们找到了我们想要的东西,继续搜索一年没有意义, 所以我们突破了内部for record循环。