Python 3中的嵌套字典的迭代

时间:2018-10-13 15:19:39

标签: python dictionary syntax iteration

字典示例:

    data_noisy = {'P1': {'age': 'eighty two', 'salary': '60196.0', 'suburb': 
                 'Toorak', 'language': 'English'},
                 'P2': {'age': '49', 'salary': '-16945514.0', 'suburb': 'St. 
                 Kilda', 'language': 'Chinese'},
                 'P3': {'age': '54', 'salary': '49775.0', 'suburb': 
                 'Neverland', 'language': 'Italian'}}

想要的输出:

    data_clean = {'P1': {'age': 'None', 'salary': '60196.0', 'suburb': 
                 'Toorak', 'language': 'English'},
                 'P2': {'age': '49', 'salary': 'None', 'suburb': 'St. 
                 Kilda', 'language': 'Chinese'},
                 'P3': {'age': '54', 'salary': '49775.0', 'suburb': 'None', 
                 'language': 'Italian'}}


    MAX_SALARY = 200000

    VALID_SUBURBS = ["Richmond", "Southbank", "Fitzroy",
              "Docklands", "St. Kilda", "Footscray",
              "Hawthorn", "Parkville", "Toorak", "Brunswick",
              "Kensington", "Flemington", "Frankston", "Dandenong",
              "Caulfield", "Collingwood"]

def clean_data(data):

    data_dict = {}
    data_dict = data
    for key, value in data.items():

        for val in value.items():

            age = value['age']
            if not age.isdigit():
                data_dict['age'] = 'None'
            else:
                data_dict['age'] = value['age']

            salary = float(value['salary'])
            if salary < 0 or salary > MAX_SALARY:
                data_dict['salary'] = 'None'
            else:
                data_dict['salary'] = value['salary']

            suburb = value['suburb']
            if suburb not in VALID_SUBURBS:
                data_dict['suburb'] = 'None'
            else:
                data_dict['suburb'] = value['suburb']

    print(data_dict)

我不想更改原始字典,因此尝试将其复制,然后迭代以“清理”数据。 好像我刚遇到RuntimeError:字典在迭代过程中更改了大小。

对于使用这些嵌套词典的语法等方面的任何帮助,将不胜感激。

谢谢。

1 个答案:

答案 0 :(得分:1)

由于您不想修改原始数字,但打算使用一个副本并修改该副本,因此需要deepcopy

from copy import deepcopy

data_clean = deepcopy(data_noisy)

for i in data_clean.values():
    if not i['age'].isdigit():
        i['age'] = 'None'
    if float(i['salary']) < 0 or float(i['salary']) > MAX_SALARY:
        i['salary'] = 'None'
    if i['suburb'] not in VALID_SUBURBS:
        i['suburb'] = 'None'

print(data_noisy)
print(data_clean)
{'P1': {'age': 'eighty two', 'salary': '60196.0', 'suburb': 'Toorak', 'language': 'English'}, 'P2': {'age': '49', 'salary': '-16945514.0', 'suburb': 'St. Kilda', 'language': 'Chinese'}, 'P3': {'age': '54', 'salary': '49775.0', 'suburb': 'Neverland', 'language': 'Italian'}}
{'P1': {'age': 'None', 'salary': '60196.0', 'suburb': 'Toorak', 'language': 'English'}, 'P2': {'age': '49', 'salary': 'None', 'suburb': 'St. Kilda', 'language': 'Chinese'}, 'P3': {'age': '54', 'salary': '49775.0', 'suburb': 'None', 'language': 'Italian'}}