字典示例:
data_noisy = {'P1': {'age': 'eighty two', 'salary': '60196.0', 'suburb':
'Toorak', 'language': 'English'},
'P2': {'age': '49', 'salary': '-16945514.0', 'suburb': 'St.
Kilda', 'language': 'Chinese'},
'P3': {'age': '54', 'salary': '49775.0', 'suburb':
'Neverland', 'language': 'Italian'}}
想要的输出:
data_clean = {'P1': {'age': 'None', 'salary': '60196.0', 'suburb':
'Toorak', 'language': 'English'},
'P2': {'age': '49', 'salary': 'None', 'suburb': 'St.
Kilda', 'language': 'Chinese'},
'P3': {'age': '54', 'salary': '49775.0', 'suburb': 'None',
'language': 'Italian'}}
MAX_SALARY = 200000
VALID_SUBURBS = ["Richmond", "Southbank", "Fitzroy",
"Docklands", "St. Kilda", "Footscray",
"Hawthorn", "Parkville", "Toorak", "Brunswick",
"Kensington", "Flemington", "Frankston", "Dandenong",
"Caulfield", "Collingwood"]
def clean_data(data):
data_dict = {}
data_dict = data
for key, value in data.items():
for val in value.items():
age = value['age']
if not age.isdigit():
data_dict['age'] = 'None'
else:
data_dict['age'] = value['age']
salary = float(value['salary'])
if salary < 0 or salary > MAX_SALARY:
data_dict['salary'] = 'None'
else:
data_dict['salary'] = value['salary']
suburb = value['suburb']
if suburb not in VALID_SUBURBS:
data_dict['suburb'] = 'None'
else:
data_dict['suburb'] = value['suburb']
print(data_dict)
我不想更改原始字典,因此尝试将其复制,然后迭代以“清理”数据。 好像我刚遇到RuntimeError:字典在迭代过程中更改了大小。
对于使用这些嵌套词典的语法等方面的任何帮助,将不胜感激。
谢谢。
答案 0 :(得分:1)
由于您不想修改原始数字,但打算使用一个副本并修改该副本,因此需要deepcopy
。
from copy import deepcopy
data_clean = deepcopy(data_noisy)
for i in data_clean.values():
if not i['age'].isdigit():
i['age'] = 'None'
if float(i['salary']) < 0 or float(i['salary']) > MAX_SALARY:
i['salary'] = 'None'
if i['suburb'] not in VALID_SUBURBS:
i['suburb'] = 'None'
print(data_noisy)
print(data_clean)
{'P1': {'age': 'eighty two', 'salary': '60196.0', 'suburb': 'Toorak', 'language': 'English'}, 'P2': {'age': '49', 'salary': '-16945514.0', 'suburb': 'St. Kilda', 'language': 'Chinese'}, 'P3': {'age': '54', 'salary': '49775.0', 'suburb': 'Neverland', 'language': 'Italian'}} {'P1': {'age': 'None', 'salary': '60196.0', 'suburb': 'Toorak', 'language': 'English'}, 'P2': {'age': '49', 'salary': 'None', 'suburb': 'St. Kilda', 'language': 'Chinese'}, 'P3': {'age': '54', 'salary': '49775.0', 'suburb': 'None', 'language': 'Italian'}}