大家好,我目前正在我的学校项目中工作,我需要将dic转换为数据框,以便将其用于机器学习。
myDic = {
'Acura': {
'CL': {
'2003': {
'transmission': '4',
'engine': '1',
'drivetrain': 'NHTSA: 13',
'wheels_hubs': 'NHTSA: 8',
'seat_belts_air_bags': 'NHTSA: 6',
'brakes': 'NHTSA: 6',
'lights': 'NHTSA: 5',
'body_paint': 'NHTSA: 2',
'fuel_system': 'NHTSA: 2',
'electrical': 'NHTSA: 2',
'suspension': 'NHTSA: 2',
'miscellaneous': 'NHTSA: 1',
'steering': 'NHTSA: 1'
},
'2002': {
'transmission': '2',
'engine': 'NHTSA: 8',
'brakes': 'NHTSA: 7',
'electrical': 'NHTSA: 4',
'accessories-interior': 'NHTSA: 3',
'seat_belts_air_bags': 'NHTSA: 3',
'suspension': 'NHTSA: 2',
'drivetrain': 'NHTSA: 2',
'body_paint': 'NHTSA: 1',
'accessories-exterior': 'NHTSA: 1',
'windows_windshield': 'NHTSA: 1',
'fuel_system': 'NHTSA: 1',
'steering': 'NHTSA: 1',
'miscellaneous': 'NHTSA: 1'
}
}
}
}
就这样继续下去。我可以用myDic['Acura']['CL']['2003']
来搜索我的dic,意思是“品牌”-“型号”-“年份”,它给出了有关汽车的问题。那么如何将其转换为数据框?列将是品牌,型号,年份以及存在的问题?
答案 0 :(得分:0)
我认为您正在寻找的是
import pandas as pd
restructure_dict = {
(level1_key, level2_key, level3_key): values
for level1_key, level2_dict in myDic.items()
for level2_key, level3_dict in level2_dict.items()
for level3_key, values in level3_dict.items()
}
df = pd.DataFrame(restructure_dict).T.reset_index()
df = df.rename(columns={'level_0': 'brand', 'level_1': 'model', 'level_2': 'year'})
print(df)
,输出将是:
brand model year transmission engine drivetrain wheels_hubs seat_belts_air_bags brakes lights body_paint fuel_system electrical suspension miscellaneous steering accessories-interior accessories-exterior windows_windshield
0 Acura CL 2003 4 1 NHTSA: 13 NHTSA: 8 NHTSA: 6 NHTSA: 6 NHTSA: 5 NHTSA: 2 NHTSA: 2 NHTSA: 2 NHTSA: 2 NHTSA: 1 NHTSA: 1 NaN NaN NaN
1 Acura CL 2002 2 NHTSA: 8 NHTSA: 2 NaN NHTSA: 3 NHTSA: 7 NaN NHTSA: 1 NHTSA: 1 NHTSA: 4 NHTSA: 2 NHTSA: 1 NHTSA: 1 NHTSA: 3 NHTSA: 1 NHTSA: 1
另一个可能的解决方案可能是:
import pandas as pd
restructure_dict = {
(level1_key, level2_key, level3_key): values
for level1_key, level2_dict in myDic.items()
for level2_key, level3_dict in level2_dict.items()
for level3_key, values in level3_dict.items()
}
df = pd.DataFrame(restructure_dict)
print(df)
输出将是:
Acura
CL
2003 2002
transmission 4 2
engine 1 NHTSA: 8
drivetrain NHTSA: 13 NHTSA: 2
wheels_hubs NHTSA: 8 NaN
seat_belts_air_bags NHTSA: 6 NHTSA: 3
brakes NHTSA: 6 NHTSA: 7
lights NHTSA: 5 NaN
body_paint NHTSA: 2 NHTSA: 1
fuel_system NHTSA: 2 NHTSA: 1
electrical NHTSA: 2 NHTSA: 4
suspension NHTSA: 2 NHTSA: 2
miscellaneous NHTSA: 1 NHTSA: 1
steering NHTSA: 1 NHTSA: 1
accessories-interior NaN NHTSA: 3
accessories-exterior NaN NHTSA: 1
windows_windshield NaN NHTSA: 1
另一个选择是上述结果的转置版本:
import pandas as pd
restructure_dict = {
(level1_key, level2_key, level3_key): values
for level1_key, level2_dict in myDic.items()
for level2_key, level3_dict in level2_dict.items()
for level3_key, values in level3_dict.items()
}
df = pd.DataFrame(restructure_dict).T
print(df)
输出为:
transmission engine drivetrain wheels_hubs seat_belts_air_bags brakes lights body_paint fuel_system electrical suspension miscellaneous steering accessories-interior accessories-exterior windows_windshield
Acura CL 2003 4 1 NHTSA: 13 NHTSA: 8 NHTSA: 6 NHTSA: 6 NHTSA: 5 NHTSA: 2 NHTSA: 2 NHTSA: 2 NHTSA: 2 NHTSA: 1 NHTSA: 1 NaN NaN NaN
2002 2 NHTSA: 8 NHTSA: 2 NaN NHTSA: 3 NHTSA: 7 NaN NHTSA: 1 NHTSA: 1 NHTSA: 4 NHTSA: 2 NHTSA: 1 NHTSA: 1 NHTSA: 3 NHTSA: 1 NHTSA: 1